PROBABILITY 
AND RELATED TOPICS 
IN PHYSICAL SCIENCES 


By MARK KAC 


Department of Mathematics, Cornell University 


WITH SPECIAL LECTURES BY 
G. E. Uhlenbeck 
Department of Physics, University of Michigan 


A. R. Hibbs 
Jet Propulsion Laboratory, California Institute of Technology 


Balth. van der Pol 
Emeritus Director, International Telecommunication Union 


INTERSCIENCE PUBLISHERS, LTD., LONDON 
INTERSCIENCE PUBLISHERS, INC., NEW YORK 


Copyright ©) 1959 by Interscience Publishers, Inc. 


ALL RIGHTS RESERVED. Reproduction in whole or in part is permitted for 
any purpose of the United States Government, and the information contained 
herein is available to all persons without restrictions. 


LIBRARY OF CONGRESS CATALOG CARD NUMBER 59-10443 


The Summer Seminar in Applied Mathematics of which these are the proceedings 
was arranged by the American Mathematical Society and the University of 
Colorado at Boulder, Colorado, June 23 to July 19, 1957. The Summer Seminar 
was conducted, and the proceedings prepared in part, under the following con- 
tracts and grants, each with (or to) the American Mathematical Society: (1) Con- 
tract AF 49(538)-59 with the United States Air Force, monitored by the Air 
Force Office of Scientific Research of the Air Research and Development Com- 
mand. (2) Contract AT(30-1)-2012 with the United States Atomic Energy 
Commission. (3) Grant NSF-G3193 from the National Science Foundation. 
(4) Contract Nonr-2304 (00) with the United States Navy, issued by the Office of 
Naval Research. (5) Contract DA-19-020-ORD-4373 with the Ordinance Corps, 
United States Army; Ordinance Project No. TB2-0001 (1823). 


PRINTED IN THE NETHERLANDS 
DIJKSTRA’S DRUKKERIJ N.V., V.H. BOEKDRUKKERIJ GEBROEDERS HOITSEMA, GRONINGEN 


To the Memory 
of my Parents 
and my Brother 


Innocent Victims of War 


Foreword 


The present volume is the first of four which are to contain the 
proceedings of the Summer Seminar on Applied Mathematics, 
sponsored by the American Mathematical Society and held at the 
University of Colorado over the four weeks beginning June 23, 
1957, 


The purpose of the Seminar was tutorial, to present to mature 
mathematicians the current status of the theory in the several 
fields covered, and to pose for them some of the more pressing and 
interesting mathematical problems lying open in these fields. It 
could be described as an endeavor to promote cooperation of 
mathematicians with theoretical physicists. The publication of 
these volumes is intended to extend the same information to a 
much wider public than was privileged to actually attend the 
Seminar itself, while at the same time serving as a permanent 
reference for those who did attend. 


The program of the Seminar was organized by a committee 
of the American Mathematical Society with the following member- 
ship: 

P. R. Garabedian 

A. S. Householder 

Mark Kac 

R. E. Langer 

Ce C: Lin 

Wm. Prager 

J. J. Stoker 

M. H. Martin, Chairman. 


Local arrangements, including the social and recreational 
program, were organized by a committee of the Department of 


[vii] 


viii PROBABILITY IN PHYSICAL SCIENCES 


Applied Mathematics, University of Colorado, as follows: 


J. R. Britton 

R. Ben Kriegh 

L. W. Rutland 

L. C. Snively 

K. H. Stahl 

C. A. Hutchinson, Chatrman. 


The indefatigable energy and enthusiasm of the chairmen, 
and the cooperation of other members of the university staff, 
contributed immeasurably to the successful execution of the plans 
for the seminar and to the enjoyment of the participants. 


The seminar opened Sunday evening, June 23, with an address 
by Professor Richard P. Feynman, California Institute of Technol- 
ogy, on the subject “The Relation of Mathematics to Physics.” 
The formal, technical sessions were held in the mornings, leaving 
the afternoons free for study and for holding informal discussion 
groups on related special topics. Several such were organized and 
met regularly throughout the period of the Seminar. 


A. S. HOUSEHOLDER 


October 13, 1958 


Preface 


This book is an expanded version of twelve lectures delivered 
at the Seminar in Applied Mathematics held in Boulder, Colorado, 
in the summer of 1957. 

Like the lectures, the book is supposed to furnish an introduc- 
tion to probability theory to a mature audience with little or no 
prior knowledge of the subject. 

It is not meant to be a textbook, since it is too fragmentary 
and it reflects perhaps too much the tastes, the inclinations, and 
the prejudices of the author. In fact, it is mostly an exposition of 
some of the author’s own researches and views. 


To the author the main charm of probability theory lies in the 
enormous variety of its applications. Few mathematical disci- 
plines have contributed to as wide a spectrum of subjects, a spec- 
trum ranging from number theory to physics, and even fewer have 
penetrated so decisively the whole of our scientific thinking. 

It is to convey the breadth of the subject and to exhibit its 
manifold applicability that we have set for ourselves as a goal. 

To accomplish this aim we have based the presentation ‘on a 
series of special examples and problems without any serious at- 
tempt to fit them into general schemes or theories. 

Apart from the fact that we find this mode of presentation 
much more congenial, we also feel that it is, pedagogically at least, 
much more sound. 

To aim at generality and abstraction in a course of lectures 
designed as an introduction to a new subject would be, to say the 
least, foolish. It would render a distorted and an unmotivated 
picture and it would require expository talents far beyond those of 
the author to keep the presentation from being deadly dull. 

The book is divided into four chapters. The first (Nature of 
Probabilistic Reasoning) serves mainly to illustrate how probabil- 
istic notions are introduced and usefully employed in the treat- 
ment of a variety of problems. 


[ix] 


X PROBABILITY IN PHYSICAL SCIENCES 


The second (Some Tools and Techniques of Probability 
Theory) is primarily designed to show how a problem is attacked 
and solved. It provides us also with an opportunity to discuss a 
number of questions related to the role and value of reformulations 
and to bring out the interplay, so peculiar to probability theory, 
between the combinatorial and the analytic aspects. 

The third (and longest!) is Probability in Some Problems of 
Classical Statistical Mechanics. This chapter is supplemented by 
two lectures (reproduced in Appendix I) by Professor G. E. 
Uhlenbeck on the Boltzmann equation. It was indeed fortunate 
that we were able to include in this volume so lucid an exposition 
of the physical ideas underlying this difficult and important field 
and to secure the collaboration of the foremost authority on these 
questions! 

Our Chapter III is to a large extent a running commentary on 
some of the points raised in Uhlenbeck’s first lecture and although 
the chapter is self-contained, the reader is strongly advised to read 
one in conjunction with the other. Uhlenbeck’s second lecture 
brings the reader to the frontiers of our knowledge and opens a 
vast field where opportunities for further contributions are almost 
unlimited. 

Our debt to Uhlenbeck is, however, infinitely greater. Almost 
all of Chapter III has been directly or indirectly influenced by 
discussions and correspondence stretching over almost fifteen years 
of friendship and scientific association. To acknowledge this debt 
is a pleasure that fully compensates for the pain of writing and 
rewriting the chapter. 

Chapter IV is devoted to what might well prove to be a truly 
novel way of looking at many problems of classical analysis and 
physics. 

The fundamental ideas were introduced in the early twenties 
by N. Wiener and from a different point of view by R. P. Feynman 
in 1942. Our own researches were strongly influenced by Feynman, 
although in executing the mathematical program, we had to rely 
on the rigorously established properties of Wiener’s measure in the 
space of continuous functions. 


PREFACE Xi 


Feynman’s approach to nonrelativistic quantum mechanics is 
of such elegance and intuitive appeal that we asked Dr. A. R. Hibbs 
to deliver a lecture at the seminar to acquaint the audience with 
this approach. The lecture is reproducedin Appendix II, and we 
are most grateful to Dr. Hibbs for his help. The reader may look 
forward to the appearance of a book by Professor Feynman and 
Dr. Hibbs devoted entirely to this fascinating and important 
subject. 

On our part we have concentrated in Chapter IV on exhibiting 
the potentialities of integration in function spaces as a tool of deal- 
ing with problems which, at first sight, have nothing to do with 
probability or measure in function spaces. 

Since 1948 when we first established (rigorously) the connec- 
tion between the Schrédinger equation and the average of a certain 
functional over the space of continuous functions (the idea being 
suggested by Feynman’s work), we have had a certain program 
in mind. 

The program consisted in studying analytical properties of a 
class of certain differential and integral operators by relating them 
to averages of appropriate functionals over suitable function 
spaces. 

A sketch of this program was given in our paper “On some 
connections between probability theory and differential and inte- 
gral equations,” (Proc. Sec. Berkeley Symp. in Math. Stat. and 
Prob., 1951) and an important part of it (dealing with the asymp- 
totic behavior of eigenvalues of Schrddinger’s equation) was fully 
executed by Daniel Ray in 1953. 

We have also indicated the possibility of applying the methods 
of integration in function spaces to classical potential theory. 

In more recent years a number of papers by various authors 
have appeared dealing with certain probabilistic aspects of poten- 
tial theory and related problems. These papers pursue an entirely 
different line, striving mainly toward ultimate generality of prob- 
abilistic interpretations rather than using probabilistic methods as 
a guide to discovery, understanding and proof in analysis and 
physics. 


Xii PROBABILITY IN PHYSICAL SCIENCES 


Although in a narrow technical sense, our results may be 
special cases of some more recent and more general theories, we 
prefer calculations and formulas to verbal arguments based on 
delicate (but tedious) uses of measure theory, and the loss in 
generality is more than offset by straightforwardness and con- 
creteness. 

The final two appendices contain lectures delivered by Profes- 
sor Balth. van der Pol at the invitation of the program committee. 

Although not directly related to the rest of the volume they 
exhibit once again that freshness of approach and that original 
twist for which their author is so justly famous. 

This book was prepared from lecture notes and nothing was done 
to change the informality of style. That a book on a scientific 
subject departs so radically from the proverbial “Einführung in 
die Elephantenlehre”’ is due largely to the author’s dislike of 
heavy treatises written in what may be called “an intellectual 
monotone.” If we have erred in the opposite direction we hope 
that the reader will at least understand. 

Many friends and colleagues have contributed in a variety of 
ways. Mr. Anatole Joffe assisted in the preparation of original 
mimeographed notes. Mr. John Riordan made many helpful sug- 
gestions and with patience and tact removed many crimes against 
the English language. 

Above all my thanks go to Dr. Harry Kesten who has critic- 
ally checked most of the manuscript and has found and corrected 
a staggering number of errors. 

Special thanks are due to the audience at Boulder whose faith- 
ful attendance and sustained interest in the subject was a source of 
constant encouragement and last but not least, to Interscience 
Publishers, Inc. for their splendid cooperation. 


Ithaca, N. Y. 


July, 1958 
MARK Kac 


Contents 


I. Nature of Probabilistic Reasoning. .......... 
II. Some Tools and Techniques of Probability Theory .. . 
III. Probability in Some Problems of Classical Statistical 


MechamCs, s m s Shee ERR wee Ee we 
IV. Integration in Function Spaces and Some Applications 
Appendix I. The Boltzmann Equation, by G. E. UHLENBECK 
Appendix II. Quantum Mechanics, by A. R. HIBBs. ... . 
Appendix III. Smoothing and ‘‘Unsmoothing”’’, by BALTH. VAN 


DER POl: a decu w d knee ee oe ee ee 


Appendix IV. The Finite Difference Analogy of the Periodic 
Wave Equation and the Potential Equation, by 


BALTH. VAN DER POL. .........4.. 


[xiii] 


CHAPTER I 
Nature of Probabilistic Reasoning 


1. What makes probability theory a distinct and separate 
discipline? 

It is certainly a branch of analysis and in a narrow sense a 
branch of measure theory. Its most rudimentary (but often most 
difficult!) parts are rooted in combinatorics. 

Yet probability theory transcends all of these, and to a large 
extent it defies precise placement and circumscription. 

And so we shall not even attempt to state what probability 
theory is. Instead, we shall show what it does. 

To facilitate presentation we shall propose a very general (and 
hence nearly trivial!) framework which is present in all probabil- 
istic reasoning. 

This framework consists of a set S (called “‘sample space’’) 
and a family F of subsets (called ‘‘elementary events” or ‘‘elemen- 
tary sets’’) whose measures are prescribed in advance. Finally, 
a set of rules is postulated whereby measures (or, to use an equiv- 
alent word, “probabilities’’) can be calculated. 

In classical probability theory the accepted rules are as 
follows: 

1. Denoting by u the measure, we have 


w(S)=1, pw¢)=0 ($ = empty set) 
2. If B,, are disjoint (i.e., B,N B; = ¢, 747) and measurable 
sets (i.e., u(B,) is defined) then 


u( U Bn) = È u(B,) 


3. If B is a measurable set then the complement S — B is 
also measurable. 
Usually one allows denumerable unions of sets (completely 


[1] 


2 PROBABILITY IN PHYSICAL SCIENCES 


additive measure), but in some interesting instances (notably 
number theory) only finite unions are allowed. 

4. If u(C) = 0 and BCC then u(B) = 0. 

This rule is not universally accepted but a discussion of this 
point would be irrelevant. 

Needless to say the initial assignment of measures (“proba- 
bilities”) to elementary sets (events) must be consistent with the 
preceding rules. 

That this extremely “thin” framework can support a rich 
and fruitful theory may appear surprising until one realizes that 
the richness and fruitfulness are due mainly to special measures 
and special sets. 

What the set S is, what are the elementary events, and what 
measures are to be assigned to them depend on the problem under 
consideration. 

We shall now proceed to illustrate the nature of probabilistic 
reasoning by a series of examples. 


2. Example 1. Consider a monatomic ideal gas consisting of N 
molecules of mass m in thermal equilibrium. Let v,, v,,..., Vy be 
the vector velocities of the molecules at some time instant. We 
wish to find the probability that the 2-component v,, of v, is be- 
tween a and $p, or, in other words, 


(I.2.1) Prob {æ < v4, < B} 


We now proceed to translate the problem into mathematical 
terms. 

The fact that the gas is ideal means that we neglect the po- 
tential energy of intermolecular forces and hence, denoting by E 
the total energy we have 


(1.2.2) Vit... +t vy = 2E/m, (vi = v;° V;) 
It is natural to assume that E is proportional to N, i.e., 
(1.2.3) E = KN 


where « is independent of N. 


I. NATURE OF PROBABILISTIC REASONING 3 


The state of the gas is determined by v,,..., Vy and can be 
pictured as the point (vis Viy Viz ++» Vins Vig» Viz» «++» UNgs UNys Uya) 
on the surface of the 3N-dimensional sphere S,,(R) of radius 
(1.2.4) R = (2E/m)* = (2«N/m)* 


We are interested in the event æ < v,, < P which defines a 
simple set on the surface of our sphere S3y(2«kN/m)? (namely a 
spherical zone). 

Our sample space S is thus the surface S,y(2«N/m)?, and we 
must now postulate a measure on this space. 

It is the assumption of thermal equilibrium that dictates the 
choice of the measure. 

We simply assume (which is tantamount to the definition of 
thermal equilibrium) that the measure u(B) of an “elementary” 
set B is the ratio 


surface area of B 


1.2.5 TO ee 
l ) a surface area Of S3y(2«N/m)3 


As elementary sets we can take, for instance, sets bounded by 
great circles (spherical polygons). 

The family of elementary sets must be rich enough to enable 
us to assign by the usual postulates of measure theory, measures 
(or probabilities) to a wide class of “‘interesting’’ sets (events). 

In the case under consideration the set of interest is the 
spherical zone « < Vi < B. 

Thus the problem of calculating the probability 
Prob {æ < v,, < p} is reduced to a simple geometrical calculation 
of the area of a spherical zone. 

The result is well known and is 


[’ (1 — ma?/2nN)224-9) dz 


(1.2.6) Prob fa < v, < fp} =" 
ie (1 — max?/2nN)22N-%) dx 


In the limit N->oo one obtains 


(1.2.7) lim. Prob {a < v< B=V (> =| n [exp (- Sne) dx 


N= a 


4 PROBABILITY IN PHYSICAL SCIENCES 


which reduces to the familiar Maxwell formula by putting 
(1.2.8) k = 3kT/2 


where k is a universal constant and T the absolute temperature. 

From the purely mathematical point of view the example is 
quite primitive because the set of interest happens to be extremely 
simple. 

A more sophisticated problem related to the foregoing is the 
following: 

Consider the proportion of molecules whose x-components of 
velocity lie between « and 8. What is the probability that this 
proportion deviates by more than ¢ > 0 from the already com- 
puted Prob {æ < Vi, < B}? 

Denoting by y, g(x) the function which is 1 for «< x < f 
and 0 otherwise we see that the set of interest is now defined by 
the inequality: 


1 N 
(1.2.9) N > Wa, p (Vin) — Prob {a < Uig < B} >e 
j=1 


Now, this is quite a complicated set, and the calculation of its 
measure is considerably more involved. Without going into details 
let us mention that it is easy to prove (without calculating the 
precise value of the probability!) that for every e œ 0 the proba- 
bility of (1.2.9) approaches 0 as N->oo (this is an example of the 
“weak law of large numbers’’). 

It should be borne in mind that this conclusion is still a conse- 
quence of the fundamental assumption (1.2.5). The reason that we 
find the smallness of probability of (1.2.9) gratifying is that we 
intuitively cling to interpreting probabilities as frequencies and the 
theorem stated above seems to lend support to our intuition. 

Actually, the theorem says disappointingly little. All it says, 
in fact, is the following: 

If a probability of a certain event was calculated in accordance 
with certain assumptions and rules, then the probability (again 
calculated according to the same assumptions and rules) that the 
frequency with which the event will occur in a large assembly of 


I. NATURE OF PROBABILISTIC REASONING Ə 


trials will differ significantly from the calculated probability is 
small. 

Modest as the statement is, it is essentially all one can expect 
from a purely mathematical theory. 

The applicability of such a theory to natural sciences must 
ultimately be tested by an experiment. But this is true of all 
mathematical theories when applied outside the realm of mathe- 
matics, and the vague feeling of discomfort one encounters (mostly 
among philosophers!) when first subjected to statistical reasoning 
must be attributed to the relative novelty of the ideas. 

To me there is no methodological distinction between the 
applicability of differential equations to astronomy and of proba- 
bility theory to thermodynamics or quantum mechanics. 

It works! And brutally pragmatic as this point of view is, no 
better substitute has been found. 


3. Example 2. What is the average number of real roots of an 
algebraic polynomial 


(I.3.1) a4 H at... + agtt 


with real coefficients? 

Again, we must make the problem precise by specifying the 
underlying assumptions. 

Among the infinitude of possible interpretations we choose the 
following because it is analytically the simplest. 

With the polynomial (I.3.1), we associate the point (or vector) 


(do 4,.. +, An) =a and confine our attention to the sphere 
SalI): 
n—1 
(1.3.2) lal? = Sat = 1 
0 
Let N(a) = N (ap, @,..-, @,-1), denote the number of real 


roots of the polynomial (1.3.1). 
We now interpret the average as 


(1.3.3) l N(a) do 
[Sn (L) S Sa) 


6 PROBABILITY IN PHYSICAL SCIENCES 


where 
(1.3.4) IS,(1)] = 27?" /T (4n) 


is the surface area of S,,(1) and do the surface element of S,(1). 
Our sample space is S,(1) and the measure is simply the 
ordinary superficial Lebesgue measure (i.e., a natural extension of 
the geometric notion of area on the surface of a high-dimensional 
sphere). 
We can restate the problem in a slightly more convenient way. 
Consider the integral 


(1.3.5) M, = Gap .. [> N(a) exp (—}lla]|?) day... da 


and note that for every scalar « Æ 0, N(aa) = N (a). We rewrite 
(1.3.5) in the form 


M, = (2x)-#" | exp (— 47) | fs a (@) 4, dr 


Thus 

1 
1.3.6 M= N (a) do 
ee Disa 
since 


[Sn (1)] (27) =t" |? rr exp (— 4x2) dr = 1 


Formula (1.3.5) shows that we could have taken for the sample 
space S the whole n-dimensional Euclidean space with measure of 
a set B defined as 


(1.3.7) w{B}= (2n)-#r[... [exp (—4llall*) day... da, 


This formulation, in accepted terminology, states that the 
coefficients are independent and each 1s normally distributed with 
mean 0 and variance 1. 


7 


I. NATURE OF PROBABILISTIC REASONING 


If we denote by M”) the average number of real roots in 
the interval (—1, 1) and by M the average number of roots 
outside (—1, 1), we first show that 


n 


(1.3.8) M® = M” 


and consequently 


(1.3.9) M,, = 2M) 

To see that (1.3.8) holds, note that if N® (ap a, ..., Ap) 
is the number of real roots of $a, in (—1, 1) and 
N‘? (ay, Qj, -.+,@,-,) the number of roots outside (—1, 1) one 


has (since 3i “a =i 15 aal) 


INO Gy Cts eara) =N a araea) 
Thus 
M® = C [TONY (av... ya) exp (—ġllal]?) dao... dan 
= (20) 7"... [TIN (a0, ., dna) exp (—Fllall?) day... dan, 
— M” 


We now calculate M®. 

To do this we need the following lemma: 

If f(t) continuous for a St <b and continuously differen- 
tiable for a < t< b has a finite number of turning points (i.e., 
only a finite number of points at which f’ (t) vanishes in (a, b)) then 
the number of zeros of f(t) in (a, b), (which we denote by n(a, b; f)) 
is given by the formula 


(1.3.10) (a, b; f) = (2a) |"? dé [cos [Ef()] If I at 


Multiple zeros are counted once and if either a or b is a zero it 
is Counted as 4, Let Gy .. ., ap; 0y = 4 = y < aa<... < Ay LD 
= % 4, be the abscissas of the turning points. 

We have 


8 PROBABILITY IN PHYSICAL SCIENCES 


f, cos [Ef] If Od 
= $ | cos [HOPO 
2 


j=0 


> 


j= 


=> + = (sin Ef (#41) — sin &f(a,)] 


j=0 


: cos [E(t] f(t) di 


Xj 


Ə 


where the sign + is attached if f(t) is increasing between «, and 
a4, and the sign — if it is decreasing. Thus 


(2a) |7 dé [cos [EFH] If’ (0)| at 


= (2a) [7 sin Ef 544) — sin éf («,)] d 


Let now a = —l1, b = 1 and 


n—1 
=) «a, 


k=0 


M® = (2z¢)-3” Big .. [7 exp (—$llall2) (2a) [17 dé 


[cos [EFE] If (E) dtday ... dan- 


and it can be shown! that the order of integration can be inter- 
changed ves 


(1.3.11) = (2)-4 [™ dt |" ~ dé R,(g, t) 
1 Although the proof is easy, it is not entirely trivial. We omit it 


because it is only of secondary importance as far as our presentation is 
concerned. 


I. NATURE OF PROBABILISTIC REASONING 9 


where 
R,, (6, t) = (2n)-#" ["~... |" exp (—4 lal?) 
. cos [Ef (t)] |F (£)| day... da,_; 


To complete the calculation we use the formula 


(1.3.12) 


yl = w ai (1 — cos ny) n? dn 
obtaining 
R(E, t) = a [Sqdny (2a) f >... [exp (—$llall?) 
` {cos [E (t)] — cos [f (t)] cos [nf (¢)]} day... danı 
Note now that 


cos [Ef (t)] cos [nf’(t)] 
= gRe {exp @[éf(t) + nf ()] + exp ef) — nf’) 


(1.3.13) 


and hence 
(2a) |7... ("exp (—4llal|?) cos [£/()] cos [nf (é)] day... dap- 


— $Re(2x)-4" [7 | ee exp (—$]lal|?) [exp [7 Zí (Ete + kt) a] 
n—1 


+ exp [i > (Et — knt*) a] dag... dapi 
0 
n—1 
= p exp [EE EF + ere] + exp [F5 E — ht 
Setting n = 0 we also obtain 
nn f "P exp(— $al?) cos [£/() ]day...da,_, = exp — 42 ae 


Turning to formula (1.3.13) we obtain after a few simple trans- 
formations: 


R(é,t) = 271 [Pear 2 [exp r—2e 2 g2] 
(1.3.14) 


n—1 


— exp [—4 > (E + kn)? | dn 


0 


10 PROBABILITY IN PHYSICAL SCIENCES 


where the integral is to be interpreted as the principal value. 
Setting 


n—1 
A,(t) = >, 
0 


n—1 
C(t) = X Re? 


1 
n—1 
B,(t) = > ke 
1 


we get by substituting (1.3.14) in (1.3.11) and performing the 
elementary integrations 


+1 7 
= 2r? [4:0 Calf) — BOTE y 


and finally 
A=B 

A) — BH 
A,,(¢) 

It is not difficult to show that as n—>co we have the asymp- 
totic formula 
(1.3.16) M, ~ 2x log n 

The reader may wonder to what extent the result (I.3.16) 
depends on the definition (1.3.5) or (1.3.3) of the average M, 

If we define 

M,=2-"["...[" N(a)day... da, 


J —1 —1 


1 
t 
(1.3.15) M, == 2M") e a | [Az )C 
0 


(this in accepted terminology corresponds to assuming that the 
coefficients are independent and uniformly distributed in (—1, 1)), 
one can still prove (I.3.16) although the proof becomes much more 
involved. 

It has been proved recently by Erdös and Offord that (1.3.16) 
holds even if 


M,=2-">N(41,4£1,...,4£1) 


I. NATURE OF PROBABILISTIC REASONING 1l 


i.e., the coefficients are allowed only values 4- 1 and — I, and all 
sequences + 1, + 1,..., + 1 are assigned equal weights (2-”). 

We have thus a certain insensitivity of the result (1.3.16) 
which is highly gratifying, since it makes one feel that the result 
in some way expresses the tendency of polynomials with real 
coefficients to have relatively few real roots. 

The above example is very instructive because it illustrates 
the remarkable possibilities of a statistical treatment of problems 
which on an individual basis are hopeless. 

To determine the number of real roots of an individual real 
polynomial of degree 10® is almost out of the question. One can 
nevertheless make meaningful statements about the ensemble of 
all real polynomials of degree 10°. 

We have here, in miniature, one of the most fruitful scientific 
ideas; an idea which permeates modern scientific thought and 
which achieves its most spectacular success in statistical mechanics. 

Returning to our original formulation (a equidistributed on 
the surface S,,(1)) it is shown by the same argument as before that 
denoting by <N,,(«, 8) the average number of real roots which 
fall within (a, 6) we have 


<N,, (a, B)> = a [" [1 ROR A — ede 


where 
h (t) = nt™1(1 — #)(1 — £”) 


Thus one can look upon 
Psm 


as the “average density” of real roots. 


Figure 1 


12 PROBABILITY IN PHYSICAL SCIENCES 


A graph of P,,(¢) is quite instructive and it shows that the 
real roots tend to concentrate very strongly around + 1 and — 1. 

How pronounced this tendency is, is best illustrated by notic- 
ing that 


P,(+ 1) = P,(— 1) = a [(n? — 1)/12]? ~ n/ (20/3) 


while the total area under the curve is asymptotically 221 log n. 
This observation is entirely in agreement with our intuition 

because our assumptions imply that on the average the a,’s are 

roughly of the same order of magnitude and consequently for 


ay tati+...+a4,,i" 
to be zero ¢ must be close to + 1 or — 1. Otherwise, cancellation 
becomes difficult. 


We have here a most convincing demonstration of the old 
saying of Laplace that “probability is common sense made precise.” 


4. Example 3. Let w(n) denote the number of prime divisors 
of n counting multiplicity (e.g., w(20) = w(2?- 5) = 24 1 = 3) 
and v(m) the number of prime divisors not counting multiplicity 
(e.g., v(20) = (22-5) = 2). What is the “probability” p, that 
o(n) — v(n) = ke 

The sample space S is here the set of positive integers, 
1, 2, 3,... and the measure (or probability) of a set B is its or- 
dinary density. The density is defined as follows: let Dy(B) denote 
the number of elements of B among the first N integers. 

Then we define the density D(B) (=yw(B)) of B by the 
formula 

u(B) = D(B) = lim Dy(B)/N 
N->0o 

provided, of course, the limit exists. 

The density is not a completely additive measure, i.e., if B is 
a union of countably many disjoint sets: 


B= UB, (B,NB,=0, i Æj) 


the density of B is not necessarily the sum of the densities of the 


I. NATURE OF PROBABILISTIC REASONING 13 


Bs. (For example, S, whose density is 1, is the union of sets B,, 
where B, consists of the one integer t. The density of each B, is 
clearly 0). 
This fact complicates matters and care has to be taken in 
applying probabilistic methods to number-theoretic problems. 
Let ,, 2 . . . be the primes (f, = 2, pa = 3, etc.) and write 


n= pal pal) 


This defines the functions «,(), a,(”),.... 
Consider now the set on which simultaneously œ (n) = k, 


A(n) = ky, ...,%,(%) = k, This set consists of integers of the 
form 
(1.4.1) n= pi... pirm 


where m is not divisible by any of the primes 4,, fy,..., D,. 
To find the number of integers of the form (1.4.1) which are 
less than N it is enough to find the number of integers less than 


N 
awa? 


which are not divisible by any of the primes 4,, bz .. ., Pr 
But the number of integers less than M not divisible by any 
of the primes #,,..., , is asymptotically 


M II (1 — 1/f,) 


(as M->oo) (this is a consequence of the primitive and classical 
sieve of Eratosthenes) and hence it follows that 


(1.4.2) Dfa (n) = hy... =k} = I L/pts» (1 — 1/2,) 


Formula (1.4.2) can be rewritten in a more illuminating form 
in two parts 


(1.4.3) D{a;(n) = k;} = 1/7: (1 — 1/8;) 
(1.4.4) hee Rigs 25 0) = he MPa) = k,} 


14 PROBABILITY IN PHYSICAL SCIENCES 


The product property (1.4.4) is particularly suggestive, since 
it expresses the fact that the functions «,(7),a,(m),... are 
statistically independent, and herein lies the possibility of applying 
methods familiar from probability theory to problems in number 
theory which at first sight have nothing to do with chance, in- 
dependence, or other statistical notions. 

To proceed with our problem we define functions f,,(”) by the 
formula 


and note that 
(1.4.5) w(n) — »(n) = > f(n) 


The reader will easily verify (from (I.4.4)) that the functions 
B,,(2) are also zndependent, i.e., 


(1.4.6) D{B,(m) = ky... b(n) = ky} = II D¢—s(n) = k,} 
Observe now that 
(22) e exp (— ik) exp [iż(œ(n) — v(n))] dé 
1 if w(n) — r(n) =k 
0 if w(x) — (n) KR 


Denote by R,(N) the number of integers up to N for which 
w(n) — v(n) = k and note that 


KANIN t= 2a) 7 exp (— ik) N71 > exp [té (w(n) — v(n)) | dé 
If we could prove that 


(1.4.7) lim N- È exp [ié (w(n) — v(n))] 


N—œ 


exists for every real then denoting by F(Ẹ) this limit we would 
have (by Lebesgue’s theorem on dominated convergence) 


(1.4.8) p, = lim R,(N)N- = (27) 7 exp (— i&k) F(E) dé 


N—>00 


I. NATURE OF PROBABILISTIC REASONING 15 


Independence of the f,,(m) (as expressed by (I.4.6)) leads one 
to expect that 


lim N- Z exp [25(co( n))] a i Zep [1E > Pal )] 
N— oo n=1 —>00 
= I lim N- > exp [i£By(n)] 


and since it is easily verifiable that 


lim N-71 > exp [7&8,(7) ] 


= exp (EG — 1) (1/64 — HP) +0 — 1a) 
= (1 — IPI + (be — exp #8) 


we would thus expect that 


(1.4.9)  F(&) = JI (1 — 1/f,) [1 + 1b, — exp i£)] 


a 
ll 
ran 


Were the density completely additive, the relation 


(1.4.10) lim N~ 5 exp 7é 2 B,.(n —T] lim N71 5 exp [1&6,(1) | 


N >œ n=1 k=1 N> n=1 


would follow trivially from the independence (1.4.6). 

Since density is not completely additive, a justification of 
(1.4.10) is required. 

Fortunately the justification is quite easy. 


Set 
Ly (%) = >, B, (n) 


k>r 


and note that 


n=1 k>r a 


Recalling that 
fi(n) =2—1 (2 = 2) 


16 PROBABILITY IN PHYSICAL SCIENCES 


for n which are divisible by p} but not by p+? we see that 


N ore) 
> fi(n) =2 (C — 1){{N/p.] — [N/A] 


n=1 


([x] denotes as usual the greatest integer < x) and consequently: 


N co 
2 Balm) < eN (2 — 1)/p, = 2N/ (p; — 1)? 
Thus 
N > L, (n) < > (Pr E I = Ô, 


n=1 k>r 
and 6,0 as r->00. 
Now, 
N 
(1.4.11) |N- > exp (if > £,(m)) — N7! 5 exp g 2 b(n) ) | 
n=1 k n=1 


N 
<N- Y lexp (#L,(n)) — 1| < IIN- SL, (n) < ò, 1l 


n=1 n==1 


and — of the f’s (see (1.4.6) ) implies that for each fixed r 


lim N- tS exp [E > Bl j= Il lim N= X exp [tép (n) ] 


n—>00 k=1 n—œ 


_ II (1 — 1/p,)[1 + 1/(b, — exp #€)] 


The estimate of (1.4.11) together with the fact that 6,0 as r>oo 
implies, in view of the convergence of the infinite product 


(1 — L/P) + 1/(b_ — exp 28) ] 


imk 


equation (I.4. A 
Thus (I.4.10) is justified and (see (I.4.8)) 


(1.4.12) p= (2a) |? exp(—ihe) T] (1—1/p,)[1-+1(p,—exp i) Jd 
One can rewrite (1.4.12) in the equivalent form 


(1.4.13) Some =I] (1 — 1/p,)[1 + 1/( — 2)] 


I. NATURE OF PROBABILISTIC REASONING 17 


Setting z = 0 we obtain 


po = TI (1 — 1/62) = 6/n2 


k=1 
which is a well-known result to the effect that the density of 
“square-free’”’ (‘‘Quadratfrei’’?) numbers is 6/z?. 
Setting z = 1 one obtains 


> px = 1 
k=0 
which simply means that complete additivity holds for sets 


B, = (a(n) — »(n) = k} 


Arguments based on probability theory should be used with 
extreme care. The following example (reproduced for the benefit 
of physicists who are notoriously optimistic in using heuristic 
arguments) illustrates the danger. 

The probability that an integer is not divisible by the primes 
p1,.--+, Py 1s easily shown to be: 


TI @—1/, 


j=1 
In other words the number of integers up to N not divisible 
by primes 4), ..., $y 18 approximately: 


(1.4.13) N I (1 — 1/,) 


Consider now the integers up to N not divisible by primes up 
to 4/N. The only such integers are the primes between 4/N and N 
and their number is a(N) — a(4/N). (We use the standard nota- 
tion (x) to denote the number of primes not exceeding x.) 
From (1.4.13) we are tempted to conclude that 
(1.4.14) a(N) —2(,/N)~N J| (1 — 1/2,) 
p; <y N 


But it is known that (“prime number theorem”) 


a(N) ~ N/log N 


18 PROBABILITY IN PHYSICAL SCIENCES 


and 
a — 1/p;) ~ exp (— y)/log VN = 2 exp (— y)/log N 


(y = Euler’s constant). Consequently (1.4.14) would imply that 


2 exp (— y) = 1 


which is certainly incorrect. 

Lest the reader feels abused at being subjected to such a crude 
argument we refer him to a series of letters which have appeared in 
Nature (Vol. 148 (1941) pp. 436 and 694-695) in which several 
authors engaged in a vigorous but fruitless discussion of this very 
problem. 


5. Example 4. As our last example we shall discuss briefly the 
theory of the game of “heads or tails.” 

The sample space S is in this case the set of all infinite se- 
quences of the form 


(1.5.1) E E tan 


where each A, is either the symbol H (“heads”) or T (“tails”). 

The “elementary sets”? are the sets of sequences, k of whose 
elements (k = 1, 2,3,...) are fixed. 

For instance, an elementary set is a set of sequences whose 
first three elements (k = 3) are HTH, or a set of sequences whose 
25th, 27th and 35th elements are respectively T, T, H (here again 
kes) 

In the classical theory of coin tossing (‘‘fair’’ coin, independent 
trials) the measure assigned to an elementary set is 


1/2% 
where k is the number of fixed places. 


Let us consider all (2”) sequences of length n: 


A,A... A 


n 


Consider the number N,„(«) of these sequences for which the num- 
ber of H’s differs (in absolute value) from 4m by less than «+/n. 


I. NATURE OF PROBABILISTIC REASONING 19 


Then the measure of the set of sequences having this property 
1S 
2-"N,,(«) 
Denoting by H, the number of H’s in the sequence 4,A,...4,, 
we can write 


(1.5.2) Prob {|H,, — 4n| < ayn} = 2-"N,, (a) 


Now N,,(«) is easily expressible in terms of binomial coeffi- 
cients and one gets 
Prob {|H, — $n|<a/N}=2" 5 "C, 
lk—4n|<ar/n 
It is now an excercise (though not an entirely easy one), in 
the use of Stirling’s formula to prove that 


(1.5.3) lim2-" SY °C, = (2n)-3 i exp (— lz?) dx 


noo |k—şn|< ayn 


Combining (1.5.2) with (1.5.3) we get 
(1.5.4) lim Prob {|H, —}n| < ay/n} = (2m)-# f" exp (— 42?) de 


The two formulas (1.5.3) and (1.5.4) are entirely equivalent 
but the form (1.5.4) suggests a possibility of an experimental 
verification. 

Suppose that we toss our coin mn times and collect the results 
into m groups of n observations each. In each group we count the 
numbers H®, H®,...,H%) of heads. 

We then count the number /,(m) of groups for which 


|JH® — in| < ar/n 


and compare the frequency 


with 


20 PROBABILITY IN PHYSICAL SCIENCES 


If our theory is applicable we expect reasonable agreement 
between the two numbers. Occasionally, of course, the discrep- 
ancy may be large, but this may be attributed to “bad luck.” 

If the discrepancies are large oftener than the theory predicts 
it can still be “bad luck” but we would be justified in becoming 
suspicious and be forced to reexamine our basic assumption. We 
may discover that the coin is “loaded” or that the tossing tech- 
nique is such as to cast grave doubts on the assumption of in- 
dependence. 

The reader has undoubtedly noticed that in formulating the 
theorem (1.5.4) we make no use of our sample space S of all infinite 
sequences A,A,A,.... 

In fact, all we needed was the simple space of all finite se- 
quences A,,...,A,. The limit n—oo is relatively innocuous. 
It does not require the “infinitely dimensional’’ space of all se- 
quences but rather a sequence of “finitely dimensional” spaces in 
each of which the measure theory is completely trivial. 

Why then speak of S at all? There are good and cogent 
reasons for doing so. 

For we may also inquire what is the probability 

Prob {lim H,,/n = B} 
or in other words, what is the measure of the set of those infinite 
sequences for which the limit of H„/n exists and is equal to B? 

To answer this question we must construct a measure theory 
in the set S consistent with the agreed assignment of measures to 
elementary sets. 

This can be done as follows. Interpret H as 1 and T as 0. 
Each infinite sequence A, A,... becomes now a sequence ££... 
of zeros and ones. Associate with each such sequence the real 
number ż 


t = 216+ 2%.s, +2 36,-+... 
The sample space is “mapped” this way on the interval 
071. 
The mapping is not one-to-one because, e.g., the sequences 


I. NATURE OF PROBABILISTIC REASONING 21 


1,0,0,0,...and 0, 1, 1,...map into the same number, namely 4. 

However, the set of such numbers is countable (it consists of 
“dyadic rationals,” i.e., numbers of the form v/2° with r and s 
integers) and it will not matter at all. 

Elementary sets will be mapped into unions of intervals whose 
end points are dyadic rationals and moreover the measure of each 
elementary set is equal to the sum of the lengths of the intervals into 
which the elementary set is mapped. 

Thus to construct the desired measure in S it is sufficient to 
construct a measure on the interval (0,1) consistent with the fore- 
going assignment of measures to elementary sets. This measure 
in (0, 1) is clearly the ordinary Lebesgue measure and we are now 
in the position to translate our question into a more familiar 
language. 

Write 

t= 1e (t) + 2-%e,(t) +... 


and consider the set of those ?’s (in the interval (0, 1)) for which 


lim é,(t) +... + elt) 


n—> 00 n 


=$ 


Our problem is now to determine the Lebesgue measure of 
this set. 

To do this we verify that the functions ¢,(¢), €a(t),... are 
independent, i.e., 


H {En (E) = My Ert) = No - -E&x (t) = Nr} 
a ee Ee) =o, (hi =o ks ase = e) 
= 
where u is the ordinary Lebesgue measure and where each y; is 


either 0 or 1. 
Introduce now the auxiliary functions 7,(¢) 


r;(t) = 1 — 2¢,(E) 


and note that they too are independent and hence 


firat) 1.2 %,(t) dt = ll j r,,(t) dt = 0 


j=1 * 


22 PROBABILITY IN PHYSICAL SCIENCES 


It is now easy to see that 
1 = . 
f n-4 È r,(t) | dt = n(n + 62C,) = 0 (n~?) 
Q 1 
and consequently: 
œ Ay n 4 
A 
a) n | > r(t) | dt < © 


By a standard theorem? from the theory of Lebesgue’s measure it 
follows that the series 


converges almost everywhere, i.e., except on a set of measure 0. 
Thus, a fortiori, 


5 r(t) 
lim + 


n—> n 


= 0 


except on a set of measure 0 or in other words 


Prob { lim H,/n = 4} = 1 


and 


—> 00O 


Prob {lim H,/n = 8} = 0, Bx#4 


We have demonstrated the simplest case of the “strong law 
of large numbers.”’ 

Speaking somewhat loosely we may say that ‘strong laws” 
refer to measures in “‘infinitely dimensional’ spaces, whereas 
“weak laws” (e.g., (I.5.4)) are formulated in terms of finite dimen- 
sional spaces of increasing dimensionality. 

In my opinion most “strong laws” are only of mathematical 
interest. When it comes to applications of probability theory 


2 The theorem in question is that if /,(¢) 2 0 and Bı f fn(t)di 


r [oe 0) š 
converges, the series Žin] fa(t) converges almost everywhere, i.e., except on 
a set of measure 0. 


I. NATURE OF PROBABILISTIC REASONING 23 


(especially to Physics) the “weak laws” are of primary interest 
and importance. 

Consequently we shall not be concerned in the sequel with 
“strong laws.” 

At this point it should have become clear to the reader that 
probabilistic reasoning consists in imbedding a particular situation 
in an ensemble of like situations and replacing statements about 
individuals by statements about the ensemble. 

I cannot resist adding a final example because of its simplicity 
and instructiveness. 

There is a problem, due I believe to Fréchet, of finding the 
maximum of the determinant 


A, = |lezll, Dp Wy E 


where the e’s are + l. 
The familiar estimate of Hadamard gives 


max |4,| < n?” 


and the problem (still unsolved!) is to determine when the upper 
bound can actually be achieved. 
A statistical method leads immediately to a pertinent result. 
Let e(x), 0<x <1 be such that 


We have (setting e, = e(x,,)) 
1 1 z by 
max 4) [fs a8 FT ae] 


and a trivial computation gives 


Thus 


y/n! < max |A,| < n?” 


24 PROBABILITY IN PHYSICAL SCIENCES ` 


One can actually go farther using 
1 1 sl } 
=> is i 
max |A,| = p n =|. As TI dew] 


and obtain a sharper lower bound. 

Replacing 4 by 6, 8, etc., it should be possible, in principle, to 
find max |4,„| exactly but the calculations become (hopelessly?) 
difficult. 

Still the advantages of looking at the ensemble of determi- 
nants are quite apparent. 


CHAPTER II 


Some Tools and Techniques of Probability Theory 


1. In this chapter we shall discuss in detail a number of problems 
to illustrate some of the analytic techniques and tools of proba- 
bility theory. We begin with a random walk problem which is of 
some independent interest. 

A point starting from the origin moves in the plane in steps of 
length 1. The direction of the kth step is chosen in such a way that 
the angle it makes with the previous step has the probability 
density p(«)(— x a< 2). The zeroth direction is taken to be 
that of the positive x-axis and the successive angles a, %,... 
are assumed to be chosen independently. 

We wish to find the distribution of the position of the point 
after n steps, and especially its asymptotic behavior as n—>oo. 

Let us first translate the problem into the ordinary language 
of analysis. 

Consider a specific choice of angles 


Bs sgn age 

With this choice the coordinates (x,, y,) of the point after 
n steps are clearly: 
(11.1.1) 7 == ees -+ pee (a, + a) +... + ms (a, +...+4a,) 
Y, = sin a + sin (a, + aa) +... + sin (aq, +... F an) 
We want to determine the joint probability 

Prob {z, < a, y, < b} 
Actually, we shall determine the limit as n—>œ of 
Prob {x, < a/n, Ya, < bv/n} 


The sample space S is the n-dimensional cube — 2 <S a; < a, 
j= 1,2,...,n. Our assumptions that the «’s are independent 


[25] 


26 PROBABILITY IN PHYSICAL SCIENCES 


and that each is distributed according to the density function ‘p(«) 
is translated to mean that the measure assigned to an elementary 
set B is 


(II.1.2) had . | p(ca) ... p(a,)da,... da, 


B 


(Elementary sets can be, for instance, cubes with sides parallel 
to the coordinate axes.) 
It should be clear that 


p(a) = 0 and f” p(a)da = 1 
The desired ta. is thus 


aap %1)... pla) dadas . . . da, 
=: H i 
To simplify the problem somewhat let us remove the restric- 
tion Y, < b/n (this amounts to setting b = oo) and concentrate 
on 


(11.1.83) 9, (a) = f... f o(a) - -< pom) doy. . « dan 


Lp <a N 
Now o,(a) as a function of a has the following properties: 


l. o,(a) is nondecreasing; 

2. o,(— 0) = 0, o,(+00) = 1; 

3. o,,(@) is left continuous (in our case actually continuous).3 

A function o(a) satisfying conditions 1, 2, and 3 is called a 
distribution function. 


3 If instead of assuming that the «,’s are distributed in accordance 
with a density function we were to assume, for instance, that each o, can 
assume two values +f and —f each with probability 4, o(a) would now 
be defined as a sum and it would not be continuous. It would still be left 
continuous since left continuity is a simple consequence of complete additi- 
vity of measure. 

One can treat the discrete and continuous (and mixed) cases simul- 
taneously by the use of Stieltjes integrals, but this is a minor point which 
we prefer to by-pass, referring the interested reader to any standard text. 


II. TOOLS AND TECHNIQUES OF PROBABILITY THEORY 27 


The question of convergence of distribution functions has 
received a great deal of attention in mathematical literature 
(motivated mainly by the needs of probability theory). 

The following result (due to Paul Lévy) is by now classical. 
Let 


fa(&) = [~~ exp (iga) do, (a) ( real) 
and assume that 


lim 7,(€) = f(€) 


n—>0o 


uniformly in every finite €-interval. Then there exists a unique 
distribution function o(a) such that 


F(E) = [> exp (ia) do(a) 


and such that 
o(a) > o(a) 
at every point a at which ø(a) is continuous. 

The condition that /,(€) approach /(€) uniformly in every 
finite -interval can be replaced by a simpler sounding (but equiv- 
alent) condition that /(€) be continuous for é = 0. 

The success of this theorem in the theory of probability 
(notably in the theory of sums of independent random variables) 
has overshadowed an earlier theorem which is nevertheless very 
useful and which can often be applied successfully when the fore- 
going theorem is useless. 

The theorem in question is the following: If 


(II. 1.4) lim > a do, (a) =|~ a*do(a) (k= 0,1,2,...) 
and if ~ 

(11.1.5) > (u) = co, where m = |" |a/*do(a) 

then again i 

(II.1.6) o,la)—>o(a) 


at every continuity point of o(a). 


28 PROBABILITY IN PHYSICAL SCIENCES 


The reason for condition (II.1.5) is that in case it is violated 
there may exist a distribution function t(a@) different from o(a) 
such that 

ia a®dt(a) = ma a* do(a) 
Thus by setting o,(a) = t(a) for n = 1,2, ... the conclusion 
(II.1.6) would be violated. 
Condition (II.1.5) simply insures that the moments 


> a* do(a) 


determine o(a) uniquely. 

Finally, there is a rather weak theorem of the same character 
which is again often useful (and which we shall use to solve our 
random walk problem). 

If for all real x 
(11.1.7) lim |"? exp (va) do,(a) = g(x) = [7 exp (xa) do(a) 


n> 


and if g(a) is entire (i.e., there exists an entire function g (z) which 
reduces to g(x) on the real axis) then 


(II.1.8) o, la)—>o(a) 


at each continuity point of o(a). 
Since in this form the theorem is not easily available in print, 


we sketch a proof. 
Set 


galz) = f exp (za)do,(a) (z = æ + iy) 
and note that 
ig.(2)| S|. lexp (za)| do(a) = [T7 exp (wa) do, (a) 


Moreover, it is easily seen that for each R > 0 the functions g, (2) 
are analytic in |z| < R if n is sufficiently large. 
Since lim, ,., 2,(”) exists for every real x it follows that for 


II. TOOLS AND TECHNIQUES OF PROBABILITY THEORY 29 


|z| < R the functions g,(z) are uniformly bounded and hence a sub- 
sequence 

Gn, (2) 
can be chosen which for |z| < R converges uniformly to an analytic 
function. This analytic function agrees with g(z) for —R < x < R 
and hence everywhere in the circle |z| < R. It now follows that 


Í * exp (iĉa) do, (a) (E real) 
converges uniformly in every finite -interval to g(t) and hence 


0, (a)—>o (a) 


The same argument shows that every subsequence {m,} con- 
tains a subsequence {m, } such that 


om, (4)>0 (a) 
and this implies that 

o„(a)—>o(a) 
2. We now turn to our original random walk problem and 
consider 


[7 exp (wa) do, (a) 
where o,(a) is given by (II.1.3). 
We note that 
+00 
(11.2.1) g,(x) = [~~ exp (wa) do, (a) 


+n +r 
= { one [7 exp (xe n?) pl)... p (an) da... dan 


v—T7T 


where wz, is given by (II.1.1). 
Instead of g, (a) consider a slightly more general expression: 


(11.2.2) g,,(%, y) 
=o ee vi exp [ (2m + YYm) NE] p (a1) - -< plam) dos - - -daem 


and note that from the definitions of £„ and y,, it follows that 
(setting x = v cos 0, y = r sin 0). 


30 PROBABILITY IN PHYSICAL SCIENCES 


Lm + YYm = Y > cos (a +... + — 0) 
k=1 
Setting 
(II.2.3) Eml, Y) = Ay, (8) 


(we keep v fixed and concentrate on the dependence on @) we ob- 
tain at once 


(11.2.4) h,,(0) = 7 p(a,) exp [rn cos (0 — a,)] Am (0 — u) do 
where 

h (0) = 1 
If we consider p(«) to be periodic with period 2x we get by a simple 


change of variable 


(11.2.5) An(0) = [7 p(0 — a) exp (rn cos a) hm (2%) da 


Multiplying both sides of (II.2.5) by exp (rn cos 6) and setting 
(11.2.6) exp (47n-3 cos 9) hin(8) = Pm (9) 


we obtain 
bin (0) = (°° exp (rnè cos 6)p(6 — a) exp (4rn-? cos a) pma) da 


(p (0) = exp (4rn-? cos 0)) 


If we now assume 
p(x) = p(— a) 
and 
+7 
i p (x)da < © 


we can reduce our problem to a perturbation calculation. (It now 
becomes apparent why we decided to work with 


Í ™ exp (2a) do, (a) 


for real x. It is this form which yields an integral equation with 
a veal symmetric kernel.) 


II. TOOLS AND TECHNIQUES OF PROBABILITY THEORY 31 


Let A,(1), A,(7), . . . be the eigenvalues and d!” (a), df” (a), ... 
the corresponding normalized eigenfunctions of the integral equa- 
tion: 


(II.2.8) f” exp ($ra cos 0) p (0—«) exp (4rn-2 cos a) dp (x)da = Ad (8) 
We obtain 


(1.2.9) ,,(0) = 5 2” (n) B exp (Arn-4 cos a) ġ™ (a) da p” (0) 


j= 


en 


and consequently 


(II.2.10) %,(0) = exp (— rn cos 0) S 27 (n) 


j=1 
. J7 exp (drn-4 cos «) 6” (a) da p™ (0) 


It is now quite clear what happens. Writing 


(II.2.11) pla) ~ (27x) + > a, COS Ja 
we have ~ 
(1I.2.12) p(0 — a) ~ (27)? (2%)-2 
+ 3 20, (a? cos jOrn? cos ja + a? sin JOa? sin ja) 
= 


and hence the eigenvalues of the unperturbed kernel p(0 — «) are 
E Gy Alloy wes 


Each eigenvalue za, is double and the corresponding nor- 
malized eigenfunctions are 2~2 cos kô, 2% sin kô. Moreover, 


[za] < 1 
The eigenvalues of the perturben kernel 


exp (47n-2 cos 0)p(0 — a) exp (47-3 cos a) 


will be close to the corresponding eigenvalues of the unperturbed 
one, and since in formula (II.2.10) they are raised to the power n, 


32 PROBABILITY IN PHYSICAL SCIENCES 
we see that only the eigenvalue close to 1 will matter. Thus as 
n—> © 
(II.2.13) ,(0) ~ exp (—4rn-? cos 0) åf (n) 
. [Ë exp (Jrn-# cos a) fe) dart” (6) 

We now apply the standard perturbation technique to evaluate 
A(n) and 4!" (a). We write 

A(n) =1ltntytnty+... 

Py” (a) = (20)? + nF yy (x) + ayala) +... 
and substitute these series in the integral equation (II.2.8). 


Comparison of coefficients of n~? and n~t on the two sides of 
the equation yields: 


(II.2.14) 


(11.2.15) [*" p(0 — a) [yi(x) + $7(2x)-# (cos 0 + cos «)] da 
= (2x)? ay + y (0) 


and 


+ 


(II.2.16) Í " p(6 — a) [pa(a) + dry, (a) (cos 0 + cos a) 
$ (27)? (4r)? (cos 0 + cos a)?] da 
= yo(9) + my (0) + (27)? ua 
Since ¢!")(x) is normalized we have also 
(1.2.17) [7 via) da = 0 
Integrating both sides of (I1.2.15) with respect to 6 we obtain 
(II.2.18) ly, = 0 
and it follows further that 
(1I.2.19) y(x) = 47 (27)? (1 + za) (1 — za) cos « 


Integrating (II.2.16) with respect to 0 and taking into account 
(II.2.18) and (II.2.19) we get 


II. TOOLS AND TECHNIQUES OF PROBABILITY THEORY 33 


u= 4r? (272) i Sol [ (1+-2a,) (1— za)! cos « (cos a+-cos 6) 
+ 4(cos a + cos 0)?] da dé 


Under the restrictive conditions on p the foregoing pertur- 
bation calculation can be rigorously justified. 
Finally we have 
lim h,,(6) = exp (4077?) 


N—>OO 


o? = (4n)! a p(0—a) [(1+-2a,) (1—ma,)~! cos a (cosa+-c0s 9) 
+ 4 (cos « + cos 0)?] dx d0 
and setting y = 0 


(1.2.20) lim [> exp (xa) do,(a) = exp (40222) 


N—>OO 


-1 (27) ~} [= exp (va) exp (— a?/20?) aa 
Thus 
(II.2.21) o(a) = P{a, < a\/n}—+o7}(2x)-3 F exp (—22/202) dx 


The theorem (see (II.1.7) and (II.1.8)) on the basis of which 
conclusion (II.2.21) is drawn can be easily extended to several 
variables and one can obtain the more inclusive result: 


a, (a, b) = Pira s ar/n, Yn < byn} 
—> 07? (27) ~t ee e exp [— (x? + y?)/20?] dx dy 


3. The preceding example is a simplified version of a problem 
which has received considerable attention in connection with chain 
molecules. 

The actual problem is the following. Given a chain of n links 
in space all links having length 1. Each link forms a fixed angle 
(called “valence angle”) with the preceding link and no other con- 
straints are put upon the chain. 

The problem is to find the distribution of the “size” of the 


34 PROBABILITY IN PHYSICAL SCIENCES 


chain (defined as the distance between the initial and final points). 

Denoting by r, the vector (of length 1) representing the kth 
link by i, and j, two mutually perpendicular unit vectors both 
perpendicular to r, we can write 


r41 = i, sin « cos 6, + j, sin « sin 6, + r, cos « 


where 6, is appropriately chosen in (0, 27x). 
We can now define i,,, and j,,, by the formulas: 


ip, = sin pip — cos 6, jy 
jz+}ı = COS Ô, cos ai, + sin 6, cos aj, — sin ar, 
and note that 
feta Ses = Tega’ Tet = Jen Te = 9 
The chain is now defined by the angles 
Ois Oos... 0 (0 < 6, < 2x) 
and hence we can take the (n — 1)-dimensional cube 


0= 6, < 2x, k=l ee eel | 


n-i ) 


as our sample space S. 

As the measure in S we take the ordinary Lebesque measure 
normalized so that the measure of the whole cube is 1. 

(In probabilistic terminology we say that the 0’s are independ- 
ent and uniformly distributed in (0, 2zz).) 

Taking r, to be arbitrary we see that the “size” of the chain 
is the length of the vector 


R =r +r +... +r, 


It is quite easy to calculate the average of ||R,,||?, Le., 


27 
e Ja UR all? a0... dp- 


277 


E {IIRI} = <{IRall?> = (2x) f 
In fact, let A, = A(0,) be the matrix 


sin 6, — cos 6, 0 7 
A= A(0,) = oo a cos Ô, cos «sin ô, — sin el 
sin œ cos 0, sinasin@, cose 


II. TOOLS AND TECHNIQUES OF PROBABILITY THEORY 35 


and note that the elements of the third row of the matrix 


B,=I+A,+A,4, +... +4, 4Ap2--- 4 


are 
R,:i,, R,-j, and R,-r, 

Thus: 

(11.3.1) [|R,,||2 = the (3.3) element of B, BT 

Now, 


B, =l+4f+A,+...+A,41...A, $4, = 2 + BA, * 
and consequently 
Daba = By1Ay T Aiba T B 1B 


n—1 


(since A, is orthcgonal, A? A, = 1). 
Thus 


E(B, Bilaa 
= IEE (Padhao + E{(AP Br ea E (BaB ea 


Because of our assumption that 0}, . . . 6,_, are independent and 
uniformly distributed in (0, 27) it follows that 


E Dii A} = ED, 4) E(Aj} 
= E{A,} + E{4,} E{A,.}+...+ EfA}... EAn} 
=M+M?+...4+ M” 
where 
0 0 0 
0 0 COS & 


A similar result holds, of course, for E{AF BF ,}. Finally, we get 


4 It should, of course, be understood that B,_, is a function of 
6.,...,0,-, while B, is a function of 0, 0,,..., Ôn- 


36 PROBABILITY IN PHYSICAL SCIENCES 


from (11.3.1) the recurrence relation 


1 — cos”! q 
Se ER a 
OS a 


E{\|R,||?} = 1 + 2 cos a i 
— c 


and hence 
l + cos « COS & 
ER, II?} = 1+ (»—1) — 2 


2 
esa V (a — eose a) 
l] — cos & 


l — cos a 
a formula first derived by Eyring. Experimentally (by light scat- 
tering) it is possible to determine the size of a chain molecule and 
it can be argued that it is proportional to ( E{||R]|2})?. It then 
appears that the size is proportional to „/n or to the square root 
of the molecular weight. This simple law has indeed been found 
to hold in certain circumstances. 

What makes the above treatment unsatisfactory is that one 
neglects entirely the highly significant “excluded volume” effect. 
This effect is due to the necessity of imposing the restriction on the 
chain that 


j 
(11.3.2) Sr, i> 1<i<j<n 


a restriction which expresses the fact that the atoms which “‘sit’’ 
at the ends of the links cannot overlap. 
In trying to calculate E{||R,||?} one must thus integrate only 
over that portion of the cube 0 S 0, < 2x for which (II.3.2) holds. 
This is an extremely difficult unsolved problem. 


4. Returning now to the simplified problem (i.e., neglecting the 
excluded volume effect) we might wish to determine the distribu- 
tion (in the limit oo) of some component of R,, 73. 

It is actually more efficient to try to determine the joint 
distribution of three mutually orthogonal components. 

For this purpose we consider: 

Efexp (nR, a + yn-tR,- b + zn-R, -c)} 

where a, b, c are three mutually orthogonal unit vectors, or more 
generally 


Em(an-*, yn-4, zn?) 
= Efexp (en? Rpa + yn-?R,, + b + zn-?R,, - c)} 


II. TOOLS AND TECHNIQUES OF PROBABILITY THEORY 37 


If, for the sake of simplicity, we take 
2=—1 DS. <=. 
we notice that 
an-?*R,,- a+ yn-?R,,: b+ zn-?R,,- e 


is the third component of the vector 


X 
> 


3 
(Ae a Ag Ag AP eo Agcy eens Ag) tl 


1 


Denoting by § the vector with components 2-3, yn-2, zn-?, we 
can write: 


Bm(E)= (22)... | exp[(L+- At. Ama -++ Ar) E]. -dma 5 
= exp (2n) (2) [" gm-1(4(0)5) d0 
= exp (zn~4) (27)! Í i 2 m_1(en-# sin 0 — yn-* cos 0, xn? cos a cos 0 
+ yn-? cos « sin 0 — zn-2 sin a, xn? sin a cos 0 + yn? sin a sin 0 
+ zn-% cos a) d0 


To simplify calculations let us consider the special case « = 42. 


The foregoing recurrence formula becomes: 
Lm(an-2, YNTE, zn?) 
= exp (znt) (27)! 1 2m_(en—2 sin 0 — yn cos 0, 
— zn~*, an-* cos 0 + ynt sin 0) dO 
Set a= (1 — 22)? cos $, y = (1 — z*)?# sin B; without loss of 


generality we may as well take x? + y? + 22 = 1, obtaining: 


5 The subscript 3 means that we are taking the third component of 
the vector in the exponent. 


38 PROBABILITY IN PHYSICAL SCIENCES 


Em( (1 — 22)? n? cos p, (1 — 2?)3 n~ sin B, zn-2) 
= exp (zn-4) (2n)-1 [" &m1( (1 — 22)# n sin (0 — p), — zn, 
(1 — 22)? n? cos (0 — B)) do 
= exp (zn?) (27)! w Em1, (1 — 22)? nè sin 0, — zn, 
(1 — 22)? nè cos 6) dO 


It thus appears that g,, 1s independent of 8 and hence we can take 
B=0. This gives 


2m( (1 — 22) nt, 0, znt) = exp (zn?) (27) 

. N Em-1( (1 — 22)? nÈ sin 0, — zn-4, (1 — 22)? n~} cos 0) dO 
But 
Em-1( (1 — 22)? n~ sin 0, — zn-2, (1 — 22)? n~? cos 0) 

= gm-1(07Ż [1 — (1 — 2?) sin? 0JÈ, 0, (1 — 22)? 274 cos 0) 
and setting 
Emh (1 — 22) n, 0, zn?) = f,,(2), 
we have 
fm(2) = exp (zn-*) (27)! = fm-1( (1 — 2?) cos 0)? d0 

or by a simple transformation 


(IL.4.1) falz) = exp (ent) x | a fno) [1— (22 +?) ] do 


Let the kernel K(w, z) be defined in — 1 <a, z < 1 as follows: 


K(a, z) = wa E E 7 oi . T 3 > i 
0 , otherwise 

Then (II.4.1) can be rewritten in the form 

(11.4.2)  fnle) = exp (en) [" Klo, 2) fav) do 


Unfortunately the kernel K (z, œw) is singular and a direct applica- 


II. TOOLS AND TECHNIQUES OF PROBABILITY THEORY 39 


tion of perturbation theory may be looked upon with suspicion. 
To circumvent this difficulty we note that the iterated kernel 


+1 
Kin (2,0) = | K( w)K(u, o) du 
is entirely well behaved. 


In fact, 


+1 
(II.4.3) Í Í K? (z, w) dzdw < 00 
-1 


To see this we start with an easily verifiable relationship: ® 
(ii K%, (z, w) P,(w) dw = i* A, P, (2) 
where the P,’s are the Legendre polynomials and 
Ay = (22) k sin* 6 d0 
It follows immediately that 
~>2R 3(2k + 1) P,(z) P(o) 


k=0 
and (II.4.3) follows. 
Now, writing the recurrence (II.4.2) in the form: 


fm(2) = exp (en#) [°K (z, u) exp (un) fK (u, ©) fn- (0) do du 


we can easily symmetrize the operator and again apply the per- 
turbation theory (exactly as in the two dimensional case). We 
must distinguish the cases n even and n odd but this is a very 
minor difficulty. 

As before we obtain 


f,(2)>exp (— 3072?) 


where for « = 4, o = 1. For « 4 $m the calculations become 
more involved but the method remains in essence unchanged. 


e This is verified most efficiently by writing it in the equivalent form 
(27) -4 [27 P, ((1—z?)4 sin 0) d0 = tA, P,(2) 


40 PROBABILITY IN PHYSICAL SCIENCES 


5. As another illustration of technique often used in probability 
theory let us consider the following problem. 

A point starting from the origin moves along a straight line in 
such a way that during each elementary time interval (of duration 
1) the displacement is either +1 or —1. 

Assuming that +1 and —1 are equiprobable and that the 
displacements are independent, let us determine the asymptotic 
properties of the distribution of the number N, of times the point 
is at the origin during the first n steps. 

The sample space S is simply a set of 2” “points” each “point” 
being a sequence (X,,..., Xa) of +1’s and —1’s. The measure 
of each “point” is 27” and this is the translation of the statement 
that in our random walk the steps are independent and that in 
each step the displacements (+1 or —1) are equiprobable. 

Let now 

l m=0 
i r m #0 


and note that 


N, = V(X) + V(X, +X) +... + V(X, +... 4+ X,) 
= V (s1) + V (s2) +... + V (sn) 
where 
S =X, +... +H X, 
Let us now calculate the moments 
E {Nt k= 1, 2) ck 
It should be borne in mind that E{N*} is simply the sum 


27 > (Vq_)+...+VGQ+...+4))* 
1;=41 
Let us illustrate the principle of the calculation by taking k = 2. 
We have 


E {( V (s,))?} = ELS V (s) + 2E{ $ V(s,)V(s,)} 


1lsl<jsn 


—E{N,}+2 X E{V(s,)V(s,)} 


l1si<jsn 


II, TOOLS AND TECHNIQUES OF PROBABILITY THEORY 


Now, 
EIV is) Vis, | = FI0D 4s,==0), s; = 0} 
= Prob {s;, = 0,,5;., = 0} = Prop (s, = 0} Prob {s;_; = 0} 
Set: 
p= Prob{s-— 0}, i= l]; 2s 


and consider: 


Qn = > bin S bibs 


lSi<jsn j=1 
Now, 
CO Co 
Le = (1 — a ($27) 


and consequently 
> ENG = (1 — 2) D T 
n=1 1 1 

(Here we have used the obvious identity 


SEIN = = (1 — z) Dae! 


n=l 
In our case 
0 if 7 is odd 
p: = Prob {s, = 0} = e Q-2m if i = 2m 
or 


— a east 
p; = (2x) k cos? 0 dé 
Thus 


and consequently 


> E {NZ} 2" ~ 2(1 — 2) (1 — 2)-2 as zl 
1 


42 PROBABILITY IN PHYSICAL SCIENCES 

An easy extension of the preceding calculation gives: 
> EIN 27 SRL — 2) (1 — 22)? ~ k! 27+ (1 — z) (1 — z)-# 
1 


Or 


5 EIN! — EINE 2" ~ R241 — 2) 


where we set No = 0. 

Since N, 2N,_, we. have E{N*} > E{N® ,} and conse- 
quently an application of the Tauberian theorem of Karamata 
gives 

E{NB\ ~ k! 2ni (1k41) 


In other words: 
E{(N nt) }—>k! 2-88/T (Lk +1); k=0,1,2,... 


as n>. 
Now it is easily verified that 


2 (27) -È K ut exp (— tu?) du = k! 27} (}k + 1) 


and consequently setting: 
olx) = Prob {N n? < a} 
we have 


lim | ~ wt do,, (2) lim E {(N,, n74)*} = 2(2n)-4 [a exp (— }u2)du 


By the theorem on moments quoted in Section 1 of this chapter 
we get: 


lim a, («) = lim Prob {N,,n-# < a} = 2(2x)-# Í “ exp (— $u?) du 


noo n— 0O 


The reader will notice that the foregoing derivation can be 
immediately extended to the case where the steps X,, X,,... are 


II. TOOLS AND TECHNIQUES OF PROBABILITY THEORY 43 


still assumed to be independent but are capable of assuming arbi- 
trary integral values. 


In fact let 
Prob {X, = k} = c 


where c, = 0 and XES c, = 1 and assume, for the sake of simplici- 
The sample space S is now a set of sequences 


(Apera An) 


where each X can assume any integral value k. The measure of 
the “point” (Xi... X,) is now 


w{(X,...X,)} = ex... ex 


(the product on the right-hand side reminding us that the steps 
were assumed to be independent!). 
The only difference is now that #; is given by the formula 


n 


p; = Prob {s, = 0} = (2n)7 7 F (0) d0 
where 


#(0) = Y c, exp (ik8) 


—oo 


Consequently we would obtain 
27 k 
S HE {(N, n) } ~ k! ( (22) |" [1 — 2f (6) 1-18) 


which for f(0) = cos 0 = exp (16) + 4exp (— i0) reduces to 
the previous case. 
If 


+00 
o? = E{X?} = > ho, < 


f” (0) exists (and f” (0) = — o?) and it is easy to see that: 


44 PROBABILITY IN PHYSICAL SCIENCES 


(20) [" [1 — af (0) d0 = n fE 11 — 240) d0 
~ nmt fE [L — 2(1 — $0202)] -d0 
~ o2 (1 — 2), z>l 


provided, of course, f (0) is different from 1 in the half-open interval 
0<0<x. Thus, as before, we get 


(II.5.1) lim Prob {N n? < a07} = 2(27)-} k exp (— tu?) du 


Needless to say, in our first example ø = 1. If /(#) becomes 
1 somewhere in the half-open interval 0 < 0 < a, it follows that 
the k’s for which c, 4 0 have a common divisor. Denoting the 
greatest common divisor by / one sees that o should be replaced 
by o/h. 


6. The reader has, no doubt, already noticed that the probability 
language is but a convenient way of stating certain mathematical 
facts about rather complicated sums or integrals. 

Stripped of probabilistic terminology the statement (II.5.1) 
is simply that the limit of the sum 


> Cx, ox, +++ ex, (Cy Z 0, $ cy = 1l) 


extended over those n-tuples of integers (X,, X5,..., Xn) which 
have the property that the number of zeros in the sequence 
X,, X, + Xa X, +... + X, is less than «0-1 n? is asymptotically 


2 (2)? k exp (— tu?) du 


This is a cumbersome statement and as such seemingly devoid of 
any intrinsic interest. 
Does it become more interesting by being written in the form 
(II.5.1)? Or does it only become less cumbersome? 
We are now touching upon controversial questions which bear 
on the relation of mathematics to nonmathematical disciplines. 
If I may borrow an example from a related but different dis- 
cipline let me consider the following statement: 


II. TOOLS AND TECHNIQUES OF PROBABILITY THEORY 45 


Under suitable assumptions (assuring the existence of all 
pertinent integrals) if f(x) is even and 


b(p) = (27) |” exp (izp) f(x) dex 
then: 


[72 aeea [9216 (9) 2dp = Ef" tw) Pde [> op) Bap 


This is a mildly amusing inequality whose proof is quite easy, 
requiring nothing more than Parseval’s relation and an adroit use 
of Schwarz’s inequality. 

It happens to be, nevertheless, the mathematical formulation 
of the uncertainty principle of quantum mechanics, a principle 
which without doubt has a profound bearing on our thinking about 
natural phenomena. 

There can be no denying that our reaction to a scientific 
statement depends greatly on the context in which it is made. In 
one context it may be merely a true statement and its truth alone 
may be a rather feeble excuse for making it. In another context 
and in a suitable language it may be highly revealing and sug- 
gestive. 

Since probability theory owes both its inception and its 
growth primarily to nonmathematical questions (e.g. games of 
chance, kinetic theory of matter, etc.) the nature of its problems 
has been overwhelmingly influenced by nonmathematical con- 
siderations. 

We shall return to this question in the next chapter in which 
we shall analyze, in some detail, the role of probability in classical 
statistical mechanics. 

To round out this chapter we shall discuss two quite unrelated 
problems both illustrating the advantages of reformulations in the 
language of probability. 


7. The first problem concerns the random walk discussed toward 
the end of Section 5. 
The steps X,, Xə, . . ., X, are independent and we assume as 


46 PROBABILITY IN PHYSICAL SCIENCES 


before that 


Prob {X, = k} = Prob {X, = — k} = c, 
We assume furthermore that 
Co # O 
and 
F ka, < 0 


The assumption cy 0 is not essential but will save us some talk 
in the sequel. 

Now, consider not the whole sample space S (i.e., the set of all 
n-tuples (X4, Xə . . -, X,,) of integers) but the subspace S* defined 
by the condition 


Ss, =X, +... +X, = 0 


Define the measure on S* as follows: the measure of a point 
(L... la) in S* is equal to: 


_ Prob{X,=4,...,Xn=ly Xi +... + X,=0} 


{ys day sey bg} = 
a Prob {X, +... + X, = 0} 
(II.7.1) 


7 
ra 620% e Cy aval 
O Cfg bing T E E a ts la 


Prob (s, = 0} Prob {s, = 0} 


Prob {s, = 0} is the measure (see the beginning of Section 5) of 
the set defined by the condition s, = 0. 
Recalling that 


f(0) = F cx exp (iR) 
we have: 
Prob {s, = 0} = (2m) ii jn (8) d0 = e £0 
The new measure in S* is known as “‘conditional probability.” 


7 It is clear that 
Lh=—(+...+4,-1) 


II. TOOLS AND TECHNIQUES OF PROBABILITY THEORY 47 


Consider now the quantity 


max (Ü, Syper Spis Sa) 


n—l? Yn 


restricted to S* (i.e., it is equal to max (0, s,,..., S,_,)) and its 


2 Snl 
average over S* (with respect, of course, to the measure (II.7.1)). 
This average (known as “conditional expectation”) is denoted by 


E {max (0, Sis. -a Sn) |S, = 0} 
and is simply the sum 


ymax (0,4,..54 Flt... th) wy» l) 


L+...+1,=0 


The crucial thing is that u is a symmetric function of h, . . ., lp. 


(" 2 =a) 
0 = 
Oj Oz... O 


n 


Let 


be a permutation; note that 


> max(0,4,..,4+...+4,) u(4,....4,) 


U+.-.+1,=0 


= 5 max (0,0, by bys oes tg beet dy Jt llop- - de.) 


lort . rg =0 


= pga (Ue re a oe ee er ee ere) 


Ut... +0,=0 


and consequently 


“SY max (0,4,...,4+...+4,) uh... la) 


t+. ° «+l =0 


1 
=— >. (2 max (O lnsevirlp ee Uy )) Wen dy) 
Ni i4...41,=0 0 
where the inner summation is over all permutations a. 
We now prove the following lemma first stated and proved 
by G. A. Hunt. 


Lemma: Let a,,..., a, be real numbers and let ø be the 
permutation 


48 PROBABILITY IN PHYSICAL SCIENCES 


Let furthermore N (o) be the number of positive elements in the 
sequence 


Then: 
> max (0, 4g, 4o, + aop- o lo, +... +4) = 3 a, N(a) 


We give an extremely simple proof due to F. J. Dyson. 
Let 


l x>o0 


E meer 


and observe that 
(17.2) max (0,4, ,4, T inresa ia, T ss T i) 
sama (OOo Gy T Oy so wee ese T a, _,) 
= b(a, +... +4.) {a+ max (0, ay, Ag+ dg. 00 Ag, os tay.) 
MAX (050.5 GAP a Og, sre Ges, 2)) 
Denote by 
G(@ 1, .. -, Wy) 


the set of those permutations whose first k indices (in some order) 
are @,,..., @,. Thus the set of all permutations of n letters can be 
decomposed into “C, disjoint sets G(w,, ..., w,). Summing (II.7.2) 
over all permutations in G(q,,..., @,) and then over all G’s we 
get: 


(Lra 2 mak (0,05. 5¢cp Gy Hosa a, ) 


— max (0,a,,...,4,, +.--+4,,_,)] 


=> 4, O(a, +... + ae) 


oC 


II. TOOLS AND TECHNIQUES OF PROBABILITY THEORY 49 


Summing (II.7.3) over k from 1 to n we clearly get the conclusion 
of the lemma. 
Using the result of the lemma and (II.7.1) we have 
> max (0,4,..54+...+4,)eG,--.,4,) 
1+...4+1,=0 


== 5 {5n Nosh 


n! L+... Hp=0 0 
where N (c) is now the number of positive elements in the sequence 
ae ee E eae eee 
But 
S (The Niou.) =} E Uy NO) Ula «+ sb) 
Iy+.--+l,=0 o o U,+...+1,=0 
and clearly 


lt. ve +, =0 
n 


(DO, +... +4)) wy EA 


Lt...+1,=0 k=l 


LOW, +... +h) OR A 


lit.. Hp =0 


| 


M= Im: 


hulli... la) 


lit... +> 0, l+.. +ip=0 


hul.. la) 


l+. . -+1,>0, lit. .- +l,=0 


i it 
bse 


| 
M 


= 
i 
p= 


Furthermore 
re 


Lyte. +p >0, It... +l, =0 


=k ») Ase PD ely) 


Li+ eeebl p> 0, l+... +l ,=0 
= k! $ l Prob {s, = L, s, = 0}/Prob {s, = 0} 8 
l=1 


8 It should be noted that 
(o 0) oO 
> l Prob {s, = L, s, = 0} < X I Prob{s, = 1} < Efja, +... + zrl} 
i=1 i=1 


+00 
< kElnh = kD [fie < © 


50 PROBABILITY IN PHYSICAL SCIENCES 


Using the easily verified relation 
Prob {s,, = L, s, = 0} = Prob {s, = L, s, — S = — L} 
= (2x)! f?" exp (— i10) f*(6) d0 (2x) |” exp (i10) fr-*(6) d0 
we finally obtain 
Prob {s, = 0} E {max (0, s1, . Sn) | Sy = O} 
= > max (Ohea h Fane Tln) inat 


= "SAS (an) [-" exp (—il0) j= (0) d0 (220) fÈ" exp (i20) fr-*(0) d0 
Using the fact that 
B[1/k + 1/(n — k)] = gn[k(n — k)] 


and that f(@) is even (so that changing / into — / in exp (2/6) will 
not affect the value of the integral) we get the curious identity 


(II.7.4) RA max (0, ledy darasal Pras F ai) nesei 


= łn 51S [i n — k) [ (2x) Hp f#(0) exp (110) d0] 
+ [(220)* f$” fr-*(8) exp (i10) 40] 
where, of course 


+00 
0) = > c, exp (ik0) 


At this point the reader may justly wonder as to why he has 
been exposed to this display of combinatorial trickery. 

We ask him to be patient a while longer, and the point we are 
trying to make should become clear. 

We turn now to a different problem. Consider the 
(m + 1) x (m + 1) Toeplitz matrix of f(0), i.e., the matrix 


Cn = ((c.;)) = ((¢,_4)), eS o 


II. TOOLS AND TECHNIQUES OF PROBABILITY THEORY 51 


Denote by 


its eigenvalues. 
We then have the following theorem: 


m+1 


(11.7.5) lim {$ Af (ma) — (m + 1) (2m) [f 4" (0) d0) 


moo j=l 


= 5 max (0, h, h + la -odh +... +44) C eee 


l+... H =0 


n 


Before we prove this statement (and the proof is quite simple) 
let us observe that in conjunction with the identity (II.7.4) it 
implies the following remarkable result of Szegö: ° 

For sufficiently small real € (I being the identity matrix) 


det (I — ¿C„) 
(0) A e 
moo exp [$(m + 1)/æ | log (1 — £f(0)) 20] 


GADP (12 n |k]? ) 


1 


where 
3 hye" (2a) Í ; “log [1—&f(6)] [1+ zexp(—26)] [1 —z exp(—10)]-140 


This purely analytic result (first proved by Szegé by a highly 
ingenious use of orthogonal polynomials) is thus equivalent to the 
probabilistic result: 


(11.7.7) Prob {s, = 0} E {max (0, Sieis S,-,]S, = 0} 


n=1 co 
=> kt Prob ({s,=—1,s, = 0} 
k=1 


i=1 


°’ Actually, Szeg6’s result is stated in a slightly different but equivalent 
way. 


52 PROBABILITY IN PHYSICAL SCIENCES 


which in turn is implied by the foregoing purely combinatorial 
lemma.?° 

Here then we have a remarkable example of a probabilistic 
result yielding a highly nontrivial analytical conclusion! 

In Chapter IV we shall see several other examples, of an 
entirely different nature, where results probabilistically motivated 
and proved by probabilistic methods lead to results in classical 
analysis. 

It remains now to prove (11.7.5). We carry out the proof for 
n = 3 inasmuch as it contains all the features of the general case. 

We have 


m+] 
2 23 (m) = trace {C3} r” 2 L Camia Ciis Cisi, 
= SO “2 © 3= 


Setting y(t) = 1,0 Si <m, and y(t) = 0 otherwise, we 
have 


m+1 


2 Aj (m) = $ y(t) p (te) y (ts) Ci — ip Cig—ig Cig—i, 


where the summation is now extended over all values of 1,, ta, ta. 
Introducing the new variables 


l 4, Jg = tg — ys J3 = tg — be 


10 For the sake of historical accuracy and to disabuse the reader from 
the notion that “black magic” is being practiced in this branch of mathe- 
matics we may as well tell the truth. Starting with Szeg6’s result, I was 
able to prove that it is equivalent to the identity (II.7.7). Once the 
reduction was accomplished and the truth of (11.7.7) firmly established 
the purely combinatorial proof (based on the lemma) was supplied by 
G. A. Hunt. 

Here I have rearranged the proof, with malice aforethought to achieve 
a certain dramatic effect. 

However, quite independently, and only a little later F. Spitzer (partly 
in collaboration with Bohnenblust) was led by probabilistic considerations 
to a vast generalization of the combinatorial lemma which he then applied 
to many problems. Spitzer’s work is, in my opinion, one of the most 
important contributions to probability theory in recent years. 


II. TOOLS AND TECHNIQUES OF PROBABILITY THEORY 53 


we get 


m+1 


> Ay m (m) = E v(t) pty + Jayla + fe + 13) 63, Ci Citi 
Observe that 
> va) ya + Ja) vla + Ja + Fs) 


= (m + 1) — {max (0, Jz, Ja + fa) — min (0, Ja, fa + 7s)} 
as long as the expression on the right-hand side is nonnegative, 
and 0 otherwise. 

The desired conclusion follows now by noticing that 


> C7 J Ci tis —_ = (2x)! \ P (0) do 


and by using the symmetry assumption c_; = c}. 
A slight refinement of the foregoing argument yields also the 
estimate 


| ap (om) — (m + 1) (2x) f” 4" (0) a | 
S2 È ¢,...¢, max (0,4,4 +4,..,4+...+ 4,4) 
Lt... +1,=0 


which is useful in justifying certain formal steps leading from 
(11.7.5) to Szegö’s result. 


8. The analysis of the preceeding section immediately suggests a 
continuous analog. 

Without going into details we simply state the pertinent facts 
inviting the reader to verify them. 

Let p(x) be such that 


l. p(t) 2 
2 p(x) = = x); 
3. E = Pl ae 15 


54 PROBABILITY IN PHYSICAL SCIENCES 


and set 
+00 A 
F(n) = |" exp (ne) p(x) de 
Assume furthermore that 
ee 
[- IEn) dn < © 


Consider the integral equation 


(II.8.1) f7 ple — y) dy) dy = abla) 


and denote its eigenvalues by å (a), /,(@),.... 
Finally, write 


p™ (x) = (2x)! |1” F”(n) exp (inz) dy 


i.e., p”(x) is the n-fold convolution of p with itself. 
Then, 


lim { 34a) — 2an AE”) dn} 


— — 2? re . [Z max (0, zis £i + Toso... 2 +... H En) 
-pfa <- - pltya)p (ty + +. + Zna) dar o o o dtn 
(analog of (II.7.5)) and 


+00 +00 
, a max (0, £i, t + %,...,% FH- F Eaa) 


° P2) ~~~ P(n) p(t +... + Ena) da... dlni 


= gn [o e E [e(n — A)T p™ (x) p(x) de 


—0O 


(analog of (II.7.4)). 
Finally for sufficiently small £ 


; D.C) 
im ————_—____ 
aco exp [2an71 K log (1 — EF (n) )dn] 


= exp | |; 2l (2m) |13 log (1 — EF (n)) exp (ine) dn]? de | 


II. TOOLS AND TECHNIQUES OF PROBABILITY THEORY 55 


(analog of (II.7.6), D,(€) being the Fredholm determinant of 
(II.8.1)). 

Here in an admittedly special example we see clearly the 
value of reformulations. Were it not for the reformulation of 
Szegö’s result in the language of probability the extensions to 
integral equations would be quite difficult if not impossible. 


9. Jor another example of advantages resulting from reformula- 
tions we turn to number theory. 

Consider the function v(m) (already defined in Example 3 of 
Chapter I) which represents the number of prime divisors (not 
counting multiplicity) of n. 

Let # be a prime and 


pol) = {6 “a 


Then 


The functions p,(”) (p running through the primes) are tn- 
dependent, i.e., 


D {Pp, (7) =E Ele en Pp, (7) = Ex} ja TI D {pp (n) E E;} 


where each e is either 0 or 1 and D{ } denotes (as in Example 3 of 
Chapter I) the density of the set of integers defined inside the 
braces. 


Set now 
v;,(1) = > Pp(") 
PSP, 
A,= > p7 
PSP, 


and denote by o,(a«) the density of integers n for which 
(v,(n) — Ay) Ag? <a 


1e., 
ola) = D {(v,(n) — Ar) Az? <a) 


56 PROBABILITY IN PHYSICAL SCIENCES 


Now, 
[7 exp (ia) doy (a) = {exp [8 (n(n) — Ar) Azt] 
= [lim N~ 5 exp [iż (v, (n) — A,) AZ*] 
00 n= 
Using the property of independence we get (as in Example 3 of 
Chapter I) 
M {exp [it (n(n) — Ay) Ap *]} 
= exp (i£A?) 


I 


= exp (—1£A?) II [1—47* + p7* exp (i£ A7 ?)] 


M {exp [i£A; t p, (n)]} 


k 
=1 


It is now easy to verify that 
lim M {exp [7&(»,(n) — A,) Az*]} = exp (— 38) 
k—>0o 
and hence it follows that 
lim o(a) = (27x)? | “exp (— du?) du 
k—>0o Fii 
A reader familiar with probability theory will not fail to 
notice that we have proved a very special case of the “central 
limit theorem.” 
Now denote by ly , (a) the number of integers n, 1 < n S N 
for which 
(n(n) — Ay) Ag? < a 


then what we have proved is 


(11.9.1) lim lim Ny , (a) = (272)? f exp (— du?) du 
k—oo N—0oo / —00 

This suggests that perhaps also 
lim N-1 Zy y (a) = (220)7} f ” 


N->0o a 


or in other words (since A, = log log N + O(1)), if M is the 


„EXP (— tu?) du 


II. TOOLS AND TECHNIQUES OF PROBABILITY THEORY 57 


number of integers up to N whose number of prime divisors is less 
than log log N + «(log log N)?, then 


NIN > (2n)-*{" exp (— 4u?) du 


This is indeed so although the proof is not easy and requires relati- 
vely delicate number-theoretic (not probabilistic!) arguments. 


CHAPTER III 


Probability in Some Problems of Classical 
Statistical Mechanics 


1. Although probability arguments were used in kinetic theory 
of gases (notably by Clausius) from the very beginning it was not 
until the great work of Maxwell and Boltzmann (from about 1870 
on) that the question of the nature and role of probabilistic reason- 
ing in Physics was brought into sharp focus. 

To understand and appreciate some of the difficulties and 
problems connected with the Maxwell-Boltzmann approach let 
us review briefly the classical derivation of the Boltzmann equa- 
tion. 

Boltzmann considered an assembly of N particles considered 
as hard spheres of diameter 6, enclosed in a volume V. Assuming 
spatial homogeneity, he then considered particles “A” (those 
whose velocities are v within the differential element dv) and 
particles “B” (those whose velocities are w within dw). 

He then proceeded to calculate the number of collisions be- 
tween particles “A” and “B” which take place during time dt and 
are such that the center line at the time of collision is given by the 
unit vector 1 within the surface element dl (on the unit sphere).¥ 

The calculation (which can be found in every textbook on 
kinetic theory of gases) is made as follows. 

In order that a particle “4” suffer a collision of the afore- 
mentioned type with a particle “5,’’ one must have 


(III.1.1) (w—v):1<0 
and the center of the “B” particle must be in a skew cylinder 
(“collision cylinder”) whose base 6?dl is on a sphere of radius ô 
and whose axis is in the direction of w — v and of length |w — v| dt. 
11 One might call these (v, w, 1) collisions. 
[59] 


60 PROBABILITY IN PHYSICAL SCIENCES 


To take into account condition (III.1.1) we can say that for a 
collision of the specified type to occur the center of a “B” particle 
must be in a cylinder of volume 
(TIT.1.2) 6?4(|\((w — v) -1| — (w — v) -l)dl dt 


Denoting by n4 and ng the numbers of particles of class “4A” and 
“B” respectively we have (by assumption of spatial homogeneity) 
that the number of collisions of the desired type is 


(III.1.3) nangô?/[V $(|w — v)-1]| — (w — v): 1)dl dt 
Furthermore one assumes that 
(IIT.1.4) na = Nif(v;t)dv, ng = Nf(w; t) dw 

After a collision of the type just specified has taken place 
v, and w are changed into 
(IIT.1.5) v + (w—v)-ll and w— (w — v)- ll 
respectively (this is calculated on the basis of conservation of 


momentum and energy in an elastic collision). 
It is now a simple matter to write down the equation, 


aie) EY = foot fawf a7 Fy — th} \(w —v) “1 


where 

=N V 

f=ft) h= Fw; t) 

j = f(v + (w y); jy = Hw — (w— v); t) 

S(1) = surface of the unit sphere. 
2. The preceding derivation appears quite innocent and the 
resulting equation (IIJ.1.6) (which is simply a conservation equa- 
tion expressing the fact that n4 is depleted by (v, w, 1) collisions 
and augmented by (v + (w — v): ll, w — (w — v) - Il, 1) colli- 
sions)!? eminently reasonable. 


12 An attentive reader will notice that use is made of the fact that the 
transformation 


A (1): (v, w) > (v + (w — v) - I, w — (w — v): lH) 
is equal to its inverse A(—l). 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 6l 


It thus came as a great surprise when a straightforward con- 
sequence of (III.1.6) appeared quite paradoxical. 

The consequence in question is the celebrated H-theorem of 
Boltzmann to the effect that 


(IIT.2.1) £ | avji t) log f(v;t) <0 


the equality occurring only for 
f = const exp (— av: v) 


The H-theorem is remarkable because it provides a bridge 
between mechanics and thermodynamics. Indeed, Boltzmann 
saw in it a mechanistic derivation of the second law with 


i= [aviw; t) log f(v; t) 
playing the role of negative entropy. 


3. The reason why the conclusion of the H-theorem cannot be 
taken literally is that in its naive interpretation it contradicts 
mechanics. 

In fact, all laws of mechanics are reversible in time (invariant 
under the transformation t—>—t), whereas the H-theorem singles 
out a time direction. This was the earliest objection to Boltz- 
mann’s approach and was first voiced by Loschmidt in 1876 (it is 
usually referred to as the “reversibility paradox”). Loschmidt 
simply pointed out that if a gas starting from some initial state Sọ 
reaches after time ¢ a state S, then presumably, H, < H,. Now 
reverse all the velocities; then on the one hand after time t, S, will 
come back to S,, and on the other hand Boltzmann’s H-theorem 
would still yield H, < H,. Thus H could not change. (To this 
objection Boltzmann reportedly replied “go ahead, reverse them!’’) 

A more decisive objection was made by Zermelo who pointed 
out that on the basis of Poincaré’s recurrence theorem (“Wieder- 
kehrsatz’’), a closed dynamical system must (unless it happens to 
start from an exceptional initial state) eventually come back 
arbitrarily near its initial state. Thus if H were a purely dynamical 
quantity it could not always decrease. 


62 PROBABILITY IN PHYSICAL SCIENCES 


Boltzmann tried to point out that the periods of time involved 
—called Poincaré cycles—are enormously long (and again re- 
portedly replied “you should wait that long!’’). 

Poincaré’s theorem is so simple and so fundamental that we 
shall interrupt the narrative to prove and extend it. 


4. A dynamical system of n degrees of freedom can be looked 
upon as a point in 2m-dimensional space (phase space, I™-space). 
The coordinates in -space are 


Ji» + + In» Prs- + Pn 
i.e., the generalized coordinates and conjugate momenta. 
The behavior of the system is determined by the Hamiltonian 


H (q, -< > In» Pis -+ Pn) 
(assumed not to contain ¢ explicitly) and the equations of motion 
are 
dq; 0H dp, 0H 


dt ap,’ d dq; 
It follows that H (which is now the Hamiltonian and should not 
be confused with Boltzmann’s H of Section 2) is a constant of 
motion (conservation of energy), and we assume that the “energy 
surface” 

H (Qs -< o n» Pis -© o Pa) = E 


is a bounded set. 
Suppose that our system starts from the point 
Po = (di> -+ o gn PI = + o Pn) 


The position at time ¢ is obtained by solving the equations of 
motion 


Q(t) = Hl «+ Ine Ply oe o Pn t) 

Pilt) = BelQts «+ o Inv Pi» + + Prs t) 
and these functions define a one-parameter family T, of transfor- 
mations of I onto itself. 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 63 


The famous Liouville theorem states that T, preserves the 
ordinary Lebesgue measure in J’: 


Lebesgue measure of T,(A) = Lebesgue measure of A 


If we restrict our system to the energy surface H(q,,...,9,; 
þis -< Pn) = E Liouville’s theorem implies the following. If the 
measure u(A) of a set A on this energy surface is defined by the 
formula 


(A) =| |lgrad A||-*do 


(we assume that ||grad H|| > c > 0 on the surface so that the 
measure of the whole surface is finite; it is enough to assume just 
this without a separate assumption of boundedness of the surface) 
where do is the surface element and 


n (dH \? OH \? 73 
sai = [3 (BY + 
llgrad H|| 2 (a, -+ 37, 


u(T,(4)) = u(A) 

All this is by the way of introduction. Mathematically, we 
abstract the following situation. 

In a set Q (energy surface) a completely additive measure u is 
defined such that u(Q) = 1 (for the sake of convenience; what’s 
important is that u(2) < œ). We have furthermore a one-para- 
meter family T, of one-to-one measure preserving transformations 
(that transformations T, are one-to-one in the dynamical case is 
a trivial consequence of the equations of motion). 

Poincaré’s recurrence theorem can now be stated as follows. 

Let A be a subset of Q such that u(A) > 0. Then for almost 
every we A (1.e., except for a set of w’s of u-measure 0) there exist 
arbitrarily large ¢ such that T,we A. 

There are many proofs of this theorem all of which are almost 
trivial.13 We choose one (by no means the simplest) convenient 
for our purposes. 


then 


13 We have here another example of an important and even profound 
fact whose purely mathematical content is very much on the surface. 


64 PROBABILITY IN PHYSICAL SCIENCES 


First of all, rather than to consider continuous time let us 
discretize it and consider only 


Tw, 1,0, Tz%@,... 
It is clear that 


Tael danm es 
Let now A, be the set of w’s such that 
weA, T,weA 
and A, the set of w’s such that 
weA,T,o¢A,...,7T,,0¢A,T,weA 

Otherwise stated 

A,=ANTy,'A 

A, = ALC Al. A 


A, = — A N C (TA) NCTA)... ACTIA) ATI" A 
where C(B) denotes the complement of B (in the set Q). 
Let f(w) be the characteristic function of the set A, i.e., 


l, weA 
flo) =| SSi 


Then, the characteristic function of A, is clearly 
f(w)(1 — f(Tyo)) ... (1 — f(T" @)) ATi o) 
and hence 
) = f f(@)(1— f(y)... (1 — f(T *@)) HT o) du 
Set 
w, = | (1 — iœ) ) (1 — #(Ti@)) .-. (1 — #(Tte)) du 
and note that 
uļ(An) = Wr — 2w, + Way (wo = 1) 


In deriving these simple formulas it should be observed that 
we make essential use of the fact that T} is measure preserving. 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 65 


In fact, taking, e.g., u(4}) we write 
u(As) =| f@)(1— (Ty) (1— f(T?@)) f(T3@) dy 

= | (1 — f(T) (1 — f(T) du 

— f O = HON = HT, 0)) (1 = f(T) du 

- | (1 — iTo) (1 — f(T}o)) (1 — f(TRe)) du 

+ [,(1— fle) (1 — (The) (1— f(T8e)) (1 — f(T3e) ) du 


and since 7, is measure preserving, the two middle integrals are 
equal. Thus 
uld) = W — 2w; + w, 
Now note that 
uli) +... Fun = 1 — w (W — Waa) 
and since the sequence {w,} is nonincreasing and bounded from 


below (by 0) lim w, exists and consequently lim (w, — Wp) = 0 
Thus 


u(A,) = 1— w = (A) 


M 3 


Thus almost every point w of A is such that at least one of its 
iterates T,w, T?m,... must be in A. 

This implies immediately that almost every point w of A is 
such that infinitely many of its iterates are in A. 

In fact, let D, be the set of w’s such that we A and T"w ¢ A 
for n =l. Applying the theorem just proved to T} (instead of 
T,) we get u(D,) = 0 and hence also 


u( UD) =N 


Thus almost every point in A returns infinitely often to A and the 
proof of Poincaré’s theorem is complete. 


5. We now have the machinery to calculate also the mean re- 
currence time (or the Poincaré cycle as it is often called). For 


66 PROBABILITY IN PHYSICAL SCIENCES 


intermittent observations, i.e., for the discrete or quantized time, 
the mean recurrence time is defined by the formula 


OF = 1fu(A) X kulda) 


More generally, taking observations every t seconds (rather than 
every second) and hence replacing 7,, by T, we have 


OF = u(A) X hn (As) 


Now, 
> ku(A;) = > k (Wy — 20, + Wg) = l — w, — n(w, — Wayi) 
k=1 k=1 
and since the sequence $? ku(A,) is nondecreasing the sequence 
W,+n(wW,—W,4,) is nonincreasing. Since clearly w,-+n(w,—w,,1) 
= 0, the limit 


lim {w, + ”(W,_, — Wnt)? 


n> 


exists, and since lim w, exists, we deduce the existence of the limit 


lim n(w, — Wp) 


n> 


Furthermore the series 


(w, — wa) + (wa — w) +... 
converges (this is simply the restatement of the existence of the 
lim w,,) and hence 


lim n(w, — Wri) = 0 


Finally, 
> ku(A,) = 1 — lim w, 
1 N—>OO 


The most interesting case is when 
lim w, = 0 
n—> 00 


This need not be true but is true if one assumes that T, is 
“metrically transitive,” i.e., the only sets left invariant by T, are 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 67 


either of u measure 0 or 1. To see this note that lim w, is the 
measure of the set of w’s such that 


w¢A, T,w¢A, T?o¢A,... 


Call this set B. 
Consider now the set T, B. If 


wel, B 
then 
Twe B 
1.€., 
Tr'w¢A, w¢A, T,o¢A,... 


and hence we B or in other words T,B œ B. Similarly 
T?B cœ T,B etc. Set 
C = lim T? B 

and observe that T,C = C. Hence by metric transitivity u(C) is 
either 0 or 1. Since u(T? B) = u(B) and 7?" Bc 1I*”B it 
follows that u(C) = u(B) or u(B) is either 0 or 1. u(B) cannot 
be 1 since this would mean u(4) = 0 contrary to the assumption. 
Thus u(B) = 0, i.e., lim w, = 0 and consequently, for metrically 
transitive T, 


(III.5.1) O* = t/u(A) 


As is well known, metric transitivity plays an important part 
in ergordic theory; however it is almost impossible to decide what 
Hamiltonians give rise to metrically transitive transformations. 

From the point of view of dynamics the formula (II.5.1) is 
thus relatively useless. 

It has also the disadvantage that in the limit t—>0 (1.e., passing 
to continuous observations) we always get the trivial limit 0. This 
is readily remedied if one notices that this degeneracy is due to 
counting the event {we A,7T,we A} as a return after time rt. 
For a continuous motion the “probability” of this event, i.e., 


wioe A, T, me A}/p(A) 


should be very close to 1, and hence a “fake” return (after time T) 


68 PROBABILITY IN PHYSICAL SCIENCES 


is assigned almost all the weight. Following Smoluchowski, one 
defines the mean recurrence time by the formula 


0, = Ee k+1 VX ald k41) 


We have (assuming metric transitivity of T,) 


> Rudra) = 3 (k-+ 1)u Ary) - SHA et) 
1 


— u(dı) — “oo — p(Ay)) = 1— u(A) 
and finally 
0, = t[1 — u(A)]/[u(A) — wwe A, T we A)] 


This formula can be used to pass to the limit t—>0 and all depends 
now on the existence of 

lim t1t{u(A) — w(we A, T, we A)} 

T->0 

Although the notion of mean recurrence time plays an im- 
portant part in the discussion of the foundations of statistical 
mechanics, it is clearly a rather crude notion. 

The phenomenon of leaving and reentering A is, in general, 
far too irregular to be adequately described by the mean cycle 
alone. Unfortunately the mean cycle is the only quantity which is 
tractable for general dynamical] systems, and even here we have to 
rely on the almost unverifiable assumption of metric transitivity. 


6. There is a formulation of the results of Sections 4 and 5 which 
although different in flavor is formally equivalent. 

Suppose that on a set 2 (of total measure 1) in which a com- 
pletely additive measure u has been defined we have a one-para- 
meter family X(t; w)(we Q, — © <t< o) of real-valued u- 
measurable functions (i.e., for each ¢ the set of w’s for which 
X (t; w) < «æ is w-measurable for every real «). 

Such a family is called a stochastic process. A process is 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 69 


called stationary in the strict sense (or strongly stationary) if 


BAX (t; w) XK Cs a 2 4g X (tn; w) << An? 
= u{X (h + T7; œ) < as... X (in + 130) < «,} 


ftor all Dy bias sr et y Ciao ray ys 

Since all this may sound somewhat abstract and unmotivated, 
let us try to explain briefly what it is that one tries to formalize. 

When a physicist speaks of “the displacement X (t), at time £, 
of a Brownian particle”? he clearly does not mean that X(t) is a 
well-defined function of ¢ What he has in mind is that X (ż) de- 
pends not only on ¢ but on something which is vaguely (and to 
some extent misleadingly) called “chance.” 

Recognizing the presence of “chance” he now settles on a 
statistical theory whose aim is to predict probabilities of occurrence 
or nonoccurrence of certain events. To construct such a theory he 
must (as we have repeatedly stressed in Chapter I) assign probabi- 
lities to some “elementary” events from which presumably other 
pertinent probabilities can then be calculated. 

This picture can be formalized by defining a stochastic process 
as before. 

The set 2 need not have any special significance, and its role 
is purely auxiliary. One could actually take for Q the set of all 
allowable functions X (t) and look upon œw as a labeling variable. 

There is a rather fundamental mathematical problem which 
arises in this connection and which ought to be at least mentioned. 
What one extracts from a physica] theory is a set of functions 


dlo ti Oy bass ma 5 Oat) 
which one hopes will represent the probabilities 
Prob {X (4) < a,...,X(,) < Xn} 


if such probabilities can be properly defined. 

The problem is thus the following: does there exist a set Q 
with a completely additive measure u (probability) and a family of 
measurable functions X(t, w) (œw e Q) such that: 


70 PROBABILITY IN PHYSICAL SCIENCES 


Probi X (G) < Gaye pee | = Aa) 
=A O) <a,..., X(t; 0) = larl a ar a 


It is, of course, understood that the functions o satisfy the obvious 
consistency condition, e.g., 
mols hias Fh la] SO as te Opt) 

This question has been answered affirmatively by Kolmogoroff and 
in greater generality by Doob. The resulting theorem, although of 
considerable importance since it provides an existence proof of 
objects about which one wants to speak, has usually little bearing 
on the more analytic aspects of probability theory. There are 
however exceptions, and in Chapter IV the question will be con- 
sidered again. 

For the time being we shall simply restate the results of 
Sections 4 and 5 in the terminology of stationary stochastic pro- 
cesses. 

Suppose that X(t, œ) is a strongly stationary stochastic p1o- 
cess and A a set of real numbers such that the set of w’s for which 
X (t, w) e A is measurable and of positive measure. (It follows from 
stationarity that 


Prob {X (t; w)« A} = u{X (t; w) e A} 
is the same for all £) Now define the following quantities: 
W,(A) = W(A) = Prob {X(¢; w) « A} 
W,(A,A,...,A, A) = Prob {X (0; w) € A, X(t; w)¢A,..., 
oe X((n — 1)t; w) ¢ A, X (nt; w) A} 
(A,A,...,A, A)/W,(A) ¥4 


(Recall that W,(A) = W(A) > 0.) Then, 


14 This is, by definition the so-called conditional probability that 
starting, from the set A at ¢ = 0 and proceeding in steps of duration t we 
reenter A for the first time after n steps. 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 71 


0o 


> P(A|A,...,4, A) = 1 

n=l ——_ a 
and = 
(III.6.1) SmrP(A|A,...,A,A) =1/W,(A) = t/W (4A) 

n=l 

provided: 
(III.6.2) lim W,(A,A,...,A) =0 

n—>0O — m 
(The definitions of W,(A,..., A) and similar quantities are self- 


explanatory.) 

The analog of Smoluchowski’s mean recurrence time is clearly 
(II1.6.3) 0, = t|1 — W(A)]/[W(A) — W,(A, A)] 
These formulas can be derived directly by repeating the simple 
steps outlined in Sections 4 and 5, and in going through the deri- 
vations one sees clearly where the assumption of strong stationarity 
is used. One could also derive these formulas from those of Sec- 
tions 4 and 5 by considering an artificial measure preserving trans- 
formation T, (the so-called “shift transformation’’). 

Conversely, the reader will notice that /(7,@) of Section 4 is 
according to our definition, a stationary stochastic process. 

Thus the “dynamical theorems” of Sections 4 and 5 and the 
“statistical theorems” (III.6.1) and (III.6.2) of this section are 
identical from the purely mathematical point of view. 

A final remark on (IIJ.6.2) is in order. Let y,(%) be the 
characteristic function of the set A (i.e., lif x « A and 0 otherwise) 
and consider 


on [ - [ya( X(t; @)) +... + xa(X (nt; œ))] — w(a)) du 
We have on one hand 


>| (aXe o)) +. + za(X (nes @))] — (A) dy 


= Noe P 


72 PROBABILITY IN PHYSICAL SCIENCES 


where Q,, is the set on which X(t; w)¢A,..., X(nt; œw) ¢ A and 
hence 


It is now easily verified that if 


(III.6.4) lim Prob {X(0; œw) e A, X (nt; w) € A} = W?(A) 
(asymptotic independence!), then 


lim ij = [za(X(t;œ)) +... + za(X (^t; o))] — w(a)) du = 0 


and hence 


(III.6.5) lim W,(A,...,A) =0 
n—>00 aa 


The property (III.6.4) (called “strong mixing’’) implies metric 
transitivity, and hence its verification for dynamical systems is 
nearly hopeless. For many “statistical” systems (III.6.4) is 
easily verified. 


7. The objections of Loschmidt and Zermelo made it clear that 
the naive formulation of the H-theorem is untenable. 

The “Stosszahlansatz” (III.1.3) on which the derivation of 
Boltzmann’s equation rests cannot be a purely dynamical conclu- 
sion, and hence some reinterpretation is necessary. Boltzmann 
himself proposed that the H-theorem be interpreted statistically. 

In spite of the brilliant analysis of P. and T. Ehrenfest which 
elucidated Boltzmann’s not always clear ideas and made them 
highly plausible, the question even now (45 years after the ap- 
pearance of Ehrenfest’s work!) is not fully settled. 

There are broadly (and somewhat vaguely) speaking two 
problems: 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 73 


I. Is it possible to reconcile both time reversibility and re- 
currence with “observable” irreversible behavior? 


II. Is it possible to achieve such a reconciliation in the realm 
of classical mechanics? 


Problem I is essentially logical in character. An affirmative 
answer can be obtained if a model is found which exhibits the 
required features. 

P. and T. Ehrenfest proposed such a model in 1907. It is 
probably one of the most instructive models in the whole of Physics 
and although merely an example of a finite Markoff chain, it is of 
considerable independent interest. 

The model in its original formulation can be described as 
follows.1® 2R balls numbered from 1 to 2R are distributed in two 
boxes A and B. An integer from 1 to 2R is chosen “at random” 
and the ball corresponding to that number is moved from the box 
in which it is to the other one. The process is then repeated a 
desired number of times. 

Intuitively it is perfectly clear what one can expect. 

Suppose, for the sake of simplicity that initially all 2R balls 
are in box A. The first drawing will necessarily result in moving a 
ball from A to B. The second drawing may result in returning to 
the original state but the probability of this is (2k). If 2R is 
very large (of the order 103, say, to use a number comparable with 
the Avogadro number), then with overwhelming probability, i.e., 
1 — (2R)- another ball will migrate from A to B. 

In fact as long as”, (number of balls in A) is much larger than 
Np (number of balls in B) we should observe a “flow” of balls from 
A to B. 

If our mathematical notion of probability corresponds to 
some kind of “reality” we should expect to observe a “nearly ir- 
reversible” flow in a preferred direction. 

We cannot say with certainty that n4(s) (number of balls in 
A after s drawings) always decreases, but we can confidentially 


15 Tt is often referred to as the ‘‘dog-flea model.” 


PROBABILITY IN PHYSICAL SCIENCES 
An experiment was actually performed with only 40 balls and 


the resulting graph is reproduced as Fig. 2. How does one use the 


expect that in some technical sense m,(s) decreases “nearly 
model to refute the objections of Loschmidt and Zermelo? 


74 
always.’’16 


80 


ae 
see 
[Ty oe Sseanuel-) 
ane ] Beene) 
LLL] = 
BEDRE LJ 
EL] ] 
RRG | | 
nnn 2 
LOT] [] 
ITI] 2 
sane t 
SEBREGREEREREES 9 
LILLIE Renee © 
ALEEA TTT TTT 
CTTSETTERTETTENTATAEENERNANLCEE 
ELLIT EAA TA TTT 
ATAF TTT TTT 
BER RERERRES ETT TETT] 
eee eee 
SERERESSRREESRREESER ERs 
SSR ESRSERER RRR SeSReeeeeesee 
LLIT TATTA TA o 
LETILTOTT TETT] i 
BOARANORSDUDELCATERTERRCENNRBA 
ELETT EELEE TATT 
LLLI AATTEET TTT 
ELITEI TTT 
ERUSSERUEERENEESERRRRRRRNNREE ELT] 
LLTI EIET TTT TTT L 
LLT LELECTT ITIITI TTT TTT] CELIT] 
RERENARERNARDOANCEERRDECRNNENNENNTS eee 
Pee TTT o 
TTT TATT TTT AT T ee eee 
LLALA TT TTT T TTT TTT TAT 
eee er 
SERBUSEERARSAEEERRER RRR Rees eee 
SEE BEER EEEREPRESESSRERRE REARS eee 
SER RSEEERERRAREEE BORER REE ee 
LLLE TTT SRE RERERSE REE eee 
BREUER PRE EER EEERES ERE REESE eRe 
SRRERRRSRERESERRORREREERERSERs Cees 
Se eee} 
BBESEERERARRERE SERRE ESR RRR eee 
BERESERSRORRECRERE ERE RERERSRRERS eRe 
SBSSUR ERNE SESE RRRURS ERE R ERROR eeees cee 
BESERRE DSRS PASE ERR R Ree eee 
SERESERRRORRT REECE RRERSRESEERERAeese ee 
SEE RRERSRCERSSEERSRR ERE ORERR eRe 
PET TTT eee eee 
eee ee 
SRR ERESSERASECCER EERE EERE SRREeS eee 
SRR EEE RER ESSER CERES EERE eee 
LEATA IITE RRR ERR E See 
SERRE R EERE SSE BERRA RORER ERs T 
SERERRE SSUES EER CREB E RRR ee Bee 
SERRGRERERERERERRABR RES REE Ts eee eee 
SSEERE SAREE EEERRERERR ESE SSs SRR 
SEUGSE ESE EESEREEERRORESS CESAR 
BEE SERSSERE EERE REE BREESE 
SERRE RRSERS SADE ERES2 eee esses 
BER SSRERSERREREDES REERERRBERE Reese 
eee o 
Pt eee) = 
eee 
LLLE T EEE SEER REE EER ERR REE 
SURES ESS SER RSRERE RSME 
BGESES SERRE REE EEERR RRO RER RRS 
BORE SURRRER EER RERERRERERRREREeR eee 
RERECS SERRE SCCR EERE SHEERS 
SRCESSEREREREEEERERERRES Ree 


ERE SERS ERRRE 
AETAT TTT 
DUOTEOEONERENRRSOENENNENEREREN 
o 


~N æ 


nif nals) = m) 


n | nals) = m} 


Ina(s) — ng(s)| = 2[na (s) — RI. 
Prob {na(s — 1) = n | na(s) = m} 
(i.e., the conditional probability that na(s — 1) = nif na(s) = m). 
From the most elementary considerations we obtain 
n | na(s) = m} 


Prob {na(s + 1) 
(i.e., the conditional probability that na(s + 1) 


and 


16 One cannot resist here the repetition of the motto of the chapter on 
entropy in the well-known book, Statistical Mechanics by Maria G.and Joseph 


= (2R)1md(n — m + 1) + (2R)1(2R — m) ô(n — m — 1) 
Mayer: “What never? No never! What never? Well, hardly ever.” It is to 


Figure 2. The ordinate on the graph represents the absolute excess 
To discuss time reversal let us calculate 


the everlasting credit of Gilbert and Sullivan that they hit upon this most 
picturesque and succint formulation of the statistical version of the second 


Prob {n4(s + 1) 
law of thermodynamics! 


(III.7.1) 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 75 


(ô(l) is the “Kronecker delta”), but it is a little more difficult to 
calculate 
Prob {na(s — 1) = n | na(s) = m} 


To do this and to gain more insight into the meaning of conditional 
probabilities let us formulate the problem with greater precision. 
Assuming n4(0) = na we see that the underlying sample space S 
is the space of all infinite sequences ng, ni, ng, ... of nonnegative 
integers between 0 and 2R such that 


l 
fp = Mei t 


The “elementary events” are sets of sequences whose first /] 
(l = 1, 2,...) elements are fixed. 
For instance the set of all sequences beginning with 


Ny, Ng + 1, no 


is an “elementary event.” 
The measure assigned to an elementary event is from the con- 
struction of our model 


(III.7.2) Pinalti) P (tina) «e e Pjani) 
where P(n,_,\n,) is the probability of transition 2,_, to n, and is 
given by 
P (My-4|%) = (2R) na(n — ni + 1) + (R) IH2R — ni) 
Ô (ny — Nnr — 1) 

Consider now the probability 

Prob {na(s — 1) = n, na(s) = m} 
This probability is by definition simply 

2 P (no[m) P (mna) .. . P(n lns) 


N,y=N, n =M 
the summation being over all n4, na, ..., n, (the formula for 
P(n,_,|”,) implies already that n, is either ,_, + 1 or nka — 1). 
The desired conditional probability is defined as the ratio 


76 PROBABILITY IN PHYSICAL SCIENCES 


Prob {n4(s—1)=n, n4(s)=m} 
Prob {n4 (s)=m} 

while the conditional probability (III.7.1) is (by definition) 

__ Prob {n4(st+1)=n, n4(s)=-m} 

E Prob {n4 (s) =m} 


Using again the definition of conditional probability we can write 


Prob {n4(s—1)=n|n4(s)=m}= 


Prob {na (s+ 1)=n|na (s) =m} 


Prob {na(s — 1) = n, na(s) = m} 

= Prob {n,4(s — 1) = n}- Prob {n4 (s) = m|n,(s — 1) = n} 
= Prob {n,4(s — 1) = n} 

-{(2R)1nd(m — n + 1) + (2R)41(2R — n)d(m — n — 1)} 
Finally, 
Prob {na(s — 1) = n} 

Prob {n4(s) = m} 
- (2R)-1nd(m — n + 1) + (2R) (2R — n) d(m — n — 1)} 

Recall that we have assumed n4(0) = nọ. Thus Prob {n4(s — 1) 


= n} and Prob {n,(s) = m} (and hence their ratio) may (and 
indeed does) depend on no.1” One can hope that as s—> œ 


lim Prob {n4(s) = m} = W (m) (W (m) independent of ng!) 


8—>oo 


(11.7.3) Prob {na(s — 1) =a|n4(s) = m} = 


exists. Physically, this is what one expects, and it corresponds to 
the feeling (or perhaps only a strong desire!) that as time goes on, 
the probability of finding our system in state m is well defined and 
independent of the initial state. Unfortunately, for the Ehrenfest 
model this is not the case! The origin of this difficulty is not deep, 
but we postpone its discussion until a little later. 

For the time being we shall extricate ourselves from this 
dilemma by a simple device. 

Instead of keeping no fixed we subject it to a distribution 


17 A more consistent way of writing these probabilities is 
Prob {na(s — 1) = n | n4 (0) = n} and Prob {ng(s) = m|n (0) = no} 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 77 


In other words the sample S will now be the set of all sequences 
Hgs nis Hayesa 


(nọ can be any integer from 0 to 2R) and the measure of the 
elementary set, instead of (III.7.2), is 


(III.7.4) W (no) P (nolim) .. - P(n aln) 
We now have 
Prob {n4 (s) = m} = > — W(m) P(n)... P(n, lm) 


and we can inquire whether W (nọ) can be so chosen that for all s 
Prob {n4 (s) = m} = W (m) 


It is clear that the answer will be in the affirmative if the 
system of equation 


2R 
(II11.7.5) W(m) => W (n) P(nįm) (m = 0,1,2,..., 2R) 
ngo=0 
has a solution in nonnegative numbers. 
It is easily seen that the unique solution (normalized by 


> W (m) = 1) is 

Wim) = C 2R 
and with this choice and the corresponding definition of probability 
(see (III.7.4)) we find 


Prob {n4 (s — 1) = nina (s) = m} 
= W(n)/W (m) { (2R) n (n—m—1)+ (2R) (2R—n) 6(n—m+1)} 


We now check easily that 


Prob {n4 (s — 1) = nina (s) = m} 
= Prob {n4 (s + 1) = ajna (s) = m} 


It thus appears that our model (with an appropriate definition 
of probability) is time reversible! 

At this point we may also reproduce a highly instructive 
calculation due to the Ehrenfests. 

Consider the conditional probability: 


78 PROBABILITY IN PHYSICAL SCIENCES 


Prob {na(s — 1) = m — 1, na(s + 1) = m — I|n,4(s) = m} 
which is clearly equal to 
Prob {na(s — 1) = m — 1, na(s) = m, na(s + 1) = m — 1)} 
Prob {n,(s) = m} 
Prob {n4(s — 1) = m — 1} P(m — 1|\m)P(m|m — 1) 
7 Prob{n,(s) = m} i 
= RC _ Pec, (2R)1(2R — m + 1) (2R) im = (2R) 2m? 


On the other hand, a similar calculation gives: 
Prob {n,(s — 1) =m + 1, na(s + 1) = m — 1|n4(s) = m} 
= (2R)-2m(2R — m) 
and finally 
Prob {n4 (s — 1) = m + 1, na(s + 1) = m + 1\n,4(s)= m} 
= (2R)-2(2R — m)? 
It follows that if m is close to 2R (1.e., we are far from equi- 
librium on the high side) the configuration 


is overwhelmingly more probable than the configurations: 
m+1 O O m+1 m+1 O O m+1 
S VA wr, 
m—1 O O m-1 
All this simply means that if from the ensemble of all“curves”’ 
Ny, Mis Mgs.. 
which n4 can follow we pick the subensemble defined by the con- 
dition 
nals) =m 
then an overwhelming proportion of the ‘“‘curves”? in the sub- 
ensemble will exhibit the configuration 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 79 


This little analysis provides an interpretation of the seemingly 
meaningless and paradoxical statement of Boltzmann that every 
point of the H-curve is a maximum. 

Interpreted statistically (as above) it makes perfect sense and 
is in agreement with both time reversibility and the tendency to 
decrease from a high value. 

To appreciate better the role of the distribution 


Wt) "46, 2% 

let us now turn to the objection of Zermelo. 
Consider again the sample space S of all sequences 
Mgs Mis Mareas 

(each n, being integer between 0 and 2R) with the measure of 
elementary sets defined by (III.7.2). Setting 

D = {hyr Mis farsas) 
and 


X (S; w) = Nn,, Cen E) L 


we check easily that X (s; w) is a strongly stationary process (time 
is now discretized). Thus the “statistical version” of the Poincaré 
theorem is applicable and we obtain 


(III.7.6) > P(noltig, - - +» fp %) = 1 
k=1 s_——_—" 
k—1 
where 
P (al igs e 8 o) No» No) 
k—1 


denotes the probability that if we start with 1, balls in box A we 
will find ng balls in A after k steps while during the intermediate 
steps the number of balls in A is different from ne (fg denotes any 
integer different from nọ). 

In other words, every initial state ng is bound to recur with 
probability 1. 


18 There is no difficulty in extending the definition to negative s. 


80 PROBABILITY IN PHYSICAL SCIENCES 


One can prove the analog of (III.5.1)® and consequently 
find that the mean recurrence time 0* is 


(III.7.7) 0* = X RP (nolo, . - -, Mig, no) = L/W (no) = 22% /PAC,, 
1 — ee 
k-1 
As we shall see later, formula (III.7.7) can be proved by a 
direct calculation without recourse to the notion of a strongly 
stationary process. 
However, the introduction of the measure based on (III.7.4) 


is not merely a mathematical artifice. 
The distribution 


W (m) = PC „ 2-28 


plays the role of the invariant measure (see Section 4) in the 
dynamical formulation and the formula (III.7.5) 


W (m) = > W (n) P (nlm) 


is the analog of Liouville’s theorem. 

As much as these analogies may appeal to us and as strongly 
as we may feel that the Ehrenfest model captures the essence of the 
dispute between Boltzmann and his opponents, it should be borne 
in mind that we have not yet clarified the fundamental Problem II 
(see p. 73). 

The model under discussion is purely statistical and the 
probabilily mechanism (i.e., drawing numbers “at random”) is 
postulated ad hoc. 

With this probability mechanism, suitable interpretations 
serve to reconcile irreversible behavior with time reversal and 
recurrence. 


8. It may be worthwhile to review briefly Boltzmann’s own 
views on the reconciliation of observed irreversible behavior with 
time reversal and recurrence. We follow the presentation of these 
views as given by the Ehrenfests in the encyclopedia article. 


19 We do not go into this here because later on we give an alternative 
derivation of the formula for the mean recurrence time. 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 8l 


In addition to the phase space of the whole system (I-space) 
consider also the phase space of a single particle (u-space). Fora 
monoatomic gas the y-space is six-dimensional (3 coordinates and 
3 components of momentum). 

Divide the y-space into cells C,, C,,... of equal six-dimen- 
sional volume |C]. 

Corresponding to a point in I’-space (1.e., a given state of the 
whole system) we get a sequence of integers 

My,Ng,... (Ny + ma +... = N = total number of particles) 
(“occupation numbers’’) representing the numbers of particles in 
cells C,,C,,... respectively. Conversely, however, to a set of 
integers m, na, ... Subject to the condition 


Yn, =N 


there corresponds a set Z (called ‘‘Z-star’’) in I-space which 
yields these integers as occupation numbers. 
The 6N-dimensional volume of the Z-star is clearly 
N! 
me ere aaa oe 
Te hyo es 
Suppose now that the cells C, are small enough so that denot- 
ing by g, the energy of a particle placed at some point in the cell C, 
(it is important to stress that one neglects the interaction energy 
between particles!) we have to a good approximation 


Sneg =E 


where E is the total enery of the system. 
On the other hand, the cells must be large enough to assure 
that the numbers n, are “‘sizeable.”’ 
Boltzmann now finds the n, which maximize 
N! 
aa ee al 
M! nal... 
under the constraints 
(III.8.1) Sn =N 


(III.8.2) Snem E 


82 PROBABILITY IN PHYSICAL SCIENCES 


by the dubious (mathematically!) device of replacing n,! by the 

value given by Stirling’s formula (this is why one needs the n’s to 

be ‘‘sizeable’’!) and treating the n;s as continuous variables. 
The problem is thus reduced to minimizing 


> n, log n, 
under the constraints (III.8.1), (III.8.2) and the answer is 
(IIT.8.3) n; = a exp (— fe,) 


where « and ĝ are determined by substituting (III.8.3) into 
(III.8.1) and (III.8.2) 

The Ehrenfests explain Boltzmann’s views as follows. 

(a) The numbers 


(III.8.4) n; = a exp (— fe;) 


not only maximize the volume of the Z-star but maximize it 
“overwhelmingly.” This means that “slight”? deviations from 
(II1.8.3) produce a “tremendous” decrease in the volume of the 
Z-star. 

(b) The Z-star corresponding to (III.8.4) cuts out an “‘over- 
whelming”’ portion of the energy surface 


H (py, ++ Pns Gres In) = E 


(c) By the ergodic theorem almost every trajectory describing 
the evolution of the system will spend an “‘overwhelming”’ portion 
of time in that part of the energy surface which is cut out by the 
maximal Z-star. 

Thus a system not in the state (III.8.4) will almost certainly 
go into it, and a system in the state (I11.8.4) will almost never get 
out of it. 

Although this argument is almost entirely lacking in mathe- 
matical rigor it has a strong aura of plausibility. 

From the physical point of view the trouble with this method 
is that the details of the “approach to equilibrium” (i.e., how 
starting from arbitrary n, s one goes over into the “equilibrium” 
ns given by (III.8.4)) are entirely absent, and the connection 
with the kinetic method (i.e., Boltzmann’s equation) is far from 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 83 


clear. In particula1, the monotonicity of the approach to equi- 
librium is lost. 


9. What is the analog for the Ehrenfest model of Boltzmann’s 
approach, just discussed? 

Suppose we start with nọ balls in box A. It can then be 
shown that in almost every% sequence 


Ng» Nis Na, . e o 
the frequency of a particular integer m is 
W (m) = **C_,2°%* 


This is the ergodic theorem for our system (also known as the 
“strong law of large numbers” ).?! 

Clearly W (m) is maximum for m = R although not “over- 
whelmingly” so. 

However a narrow band of states 


R — a(2R)} < m < R + a(2R}} 
has “probability” approximately 


(2a) [** exp (— 4u?) du 
and for « = 5 it is already “overwhelmingly” bigger that the 
“probability” of the complementary range. 

Thus if ng > R + 5(2R)* or my < R — 5(2R)? the system 
is very likely to go into the “‘near equilibrium” range, and once in 
this range it will leave it only “briefly.” ? 

This conclusion is well borne out by the experimental curve 
on p. 74. 

At this point it is also appropriate to give a short account of 
some ideas of J. W. Gibbs. 


20 “Almost every” means except for a set of sequences of measure 0. 
The measure in question is constructed on the basis of (III.7.4). 

21 This follows from the general ergodic theorem but can be proved 
very simply for the special case of the Ehrenfest model. Also one can use 
the so called “mean ergodic” theorem which is easier to prove. 

22 Still the probability of leaving the “near equilibrium” range is 1! 


84 PROBABILITY IN PHYSICAL SCIENCES 


Gibbs gives up from the beginning the idea of an individual 
system and considers a distribution of systems over the whole of 
I’-space (or better yet between energy surfaces E and E + AE), 

If the systems are initially distributed according to the density 
function: 

D (Pi, ++ Pns Go ++ + Ini 9) 


then the equations of motion permit one to determine the density 


D, = D (pi; -o Pns Go Imt) 
at time ¢. The equation governing the temporal evolution of D is 
the famous Liouville equation: 


oD 2 — oD aD aid 


Se GE, Ya SW a 
ot Meee 2 oq; op, oq; op, 


where H is the Hamiltonian of the system. 

Now, Gibbs tried to prove that in some sense D, approaches the 
uniform density between the energy surfaces E and E+ AE 
(microcanonical distribution). 

The sense in which one can hope to prove this theorem was 
first elucidated by Ehrenfest who introduced the notion of ‘‘coarse- 
grained” density (“Grobe Dichte”). 

Divide the shell between the energy surface E and E + AE 
into a finite number of fixed cells 44, 4,,..., Am and define the 
“coarse-grained” density P, as follows: 

If £ denotes a point in A, then 


l 
= (i) — 
P (£) = a == iA, | Dut q) dp, e. ap, aq, e. dgn 


(\A,| = “volume” of 4,). 

Suppose now that at £ = 0 the density D, is constant over 
each cell (i.e., at £= 0, Dy = P,) then Gibbs’s “theorem” states 
that 

P,(€)—>const 


regardless of the cell in which é is picked. 
This is almost hopeless to prove and it is a stronger statement 
than the ergodic theorem. Gibbs’s famous argument (based on 


II. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 85 


analogy with mixing fluids) apart from containing an actual error 
was, at best, an indication of the plausibility of the theorem. 
However, the following amusing theorem was proved by 
Ehrenfest (who corrected an error in Gibbs). 
Define 7, (the “Gibbs entropy”) by the formula: 


(ITI.9.0) j= > AP" ogr” 
= | P, log P,dp,... dp, dq, . . . dq, 
then: 
N: S No 
The proof is extremely simple and is based on the inequality: 
(III.9.1) x log x — x logy —xzx+y z0 


We write (dr = dp,... dpn dqu . . . dqa): 

n: — Mo = | P, log P, dr — | D, log Dy dr 
= Í D, log P, dt — Í D, log Dy dr 
= | (D, log P, — Dg log Dy) dt 

From Liouville’s theorem it follows that 
[De log D, dt = ÍD, log D, dr 
and since 
(=| Pees 
we have 
n: — No = | (D; log P, — D; log D, + D, — P,) dr 
and hence by the inequality (III.9.1) 


Ne = No 
This proves that the Gibbs entropy (actually, negative entropy) is 
largest for the coarsely uniform ensemble but tells us nothing 
about its behavior in time. 
Gibbs’s idea that probability should enter Mechanics only 


86 PROBABILITY IN PHYSICAL SCIENCES 


through D(0) is, of course, very appealing. In general, this view 
is probably untenable and probability must be made to intervene 
in some other ways. We shall discuss this point later in Sections 
14 and 15. 

One can incorporate the Gibbs view into the discussion of 
probabilistic models like the Ehrenfest model. 

One simply takes the view that at £ = 0 we have a distribution 
D(n; 0)? of initial states and tries to determine D(m; s), i.e., the 
probability of finding the system in state m after s moves. 

Gibbs’s theorem for this case would be 

lim D (5 Ss): W (m) == 28C 2-48 
S—> 00 
As remarked on p. 76 this is not the case if 
Dn 0s L; n = ng 
D(n; 0) = 0, nN F Ng 
and the reason for this must be due to the fact that this distribu- 
tion is not “sufficiently coarse grained.” We shall return to this 


point after the more detailed discussion of the Ehrenfest model 
which follows in the next two sections. 


10. We shall now show how to calculate 
(III.10.1) Prob {n,4(s) = m|n,4(0) = n} 
for the Ehrenfest model. 
We introduce the abbreviations 
(III.10.2) Prob {n4(s) = m|n,(0) = n} = P(n|m; s) 
(III.10.3) P(n\m; 1) = Hen = Q(n|m) 74 
and recall that 
(III.10.4) P(n|m; s) = > Q (nin) Q (mna) . . . Q (nlm) 


Nis. Ne-1 


This is, of course, a consequence of the way the model was 


23 That is D(n; 0) 2 0 and £E, D(n; 0) = 1. 
24 This notation is introduced to agree with Uhlenbeck’s notation in 
Part 1 of Appendix I. 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 87 


constructed, and the so called Markoffran property, i.e., that the 
probability of the finite sequence 


Ws Missa Wis M 


is Q(n|n,) Q (mna) . . . O(n, |m) is here a consequence of independ- 
ence of successive drawings. 
The transition probabilities Q(n|m) are given by the formula 


(III.10.5) Q(n|m) = (2R)1nd(m — n + 1) 
+ (2R)7(2R — n)d(m — n — 1) 


If Q denotes the matrix of transition probabilities, i.e., the matrix 
whose (n, m) element is O(m|m) we see from (III.10.4) that: 


(III.10.6) P(n\|m; s) = (n,m) element of Q° 


This simple observation is crucial in the theory of Markoff processes 
and reduces the theory to the study of powers of matrices (or, more 
generally, certain linear operators). 

To get an explicit result one tries to diagonalize Q, i.e., find a 
nonsingular matrix T such that 


a © 
(III.10.7) TOT =| 
0 dor 


Once this is accomplished one gets 


(III.10.8) =T] >. T 


and the job is finished! 

It is well known that, in general, nonsymmetric matrices can- 
not be diagonalized. However, in our special case the diagonaliza- 
tion can be performed explicitly. 


88 PROBABILITY IN PHYSICAL SCIENCES 


First the matrix Q is seen to be 


0 l 0 0 ase 0 

(QR) 0 1—(2R)4 0 sone © 

Q=| 0 2(2R)42 0 es .... 0 
0 1 0 


Then its left eigenvectors are determined by the system of linear 
equations: 


Consider now the infinite system 

(ITI.10.9) [1 — (k — 1)(2R)]a,_, + (k + 1) (2R)! 4,4, = Ax} 
k = 0, 1, 2,... 

with x_, = 0, and notice that a solution of the infinite system 

satisfying 


(III.10.10) Tər = 0 
is automatically a solution of the finite system and hence a left 
eigenvector. 


Introducing the generating function 


fle) = Days 
one obtains from (III.10.9) 
F(z) = 2R(A — z) (1 — 27) f(z) 
or 
(III.10.11) f(z) = a (1 — z)BO-A4 (1 + z)Ra+) 


Since f is analytic near the origin, the formal procedure is 
justified. 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 89 


We note now that if 
(ITI.10.12) Aa gi, pee, = Bp Iya cy 0) Vp ee cy R 


f is a polynomial of degree 2R and hence (III.10.10) is satisfied. 
The numbers A = 77! are eigenvalues of Q, and since there 
are exactly 2R + 1 of them, we have found all the eigenvalues of Q. 
The left eigenvectors are now easily seen to have the com- 
ponents 


CS TOP era Ou, 
defined by the identity 


2R 
(III.10.13) (1 — z)P-F(1 + oz) = $ CP zk 
0 
The left eigenvectors form the rows of the matrix T and it 
remains to find the inverse matrix 771. 
To do this we must solve the equation 


R 
~ CeO) 120,12 eek 


j=—R 
We have 
2R R 
es > oler = > l aa aa 
k= j=—R 
+R 
= (1 — z)? S (1+ 2)24(1 — ga g 
j=—R 
2R 


= (1 — 2)? a + 2) (1 — 2) 
1=0 


or in other words 
2R 
dE es Yo toe 2) ie) 
1=0 
Setting 
= (1+ z)(1 — z) 
we get 
z=— (1—4) (1 +ë),  1l-z=2(14+ 47 


25 In this way we are really finding the inverse of the transpose of T, 
but this makes no difference. 


90 PROBABILITY IN PHYSICAL SCIENCES 


or 
2R 
(— 1)? 2-24 (1 — £)” (1 I penar e > alt), Et 
1=0 
Comparing this with (III.10.13) we get: 
aly = (— 172R Cf" 
or 
en me ea Ones 
It is now easy to compute P(x|m; s) (see (IIT.10.6) and (III.10.8)) 
and one gets 
R 

(III.10.14) P(n|m;s) = (— 1)” 2722F 5 PROGR” 

j=—R 
11. The solution given in the preceding section was historically 
the first (1947). It followed along the standard lines of the theory 
of Markoff chains, and, as the reader can see, it required mild 
trickery. Soon after its appearance in print, several other solu- 
tions were given. Of these a solution found independently by 
A. J. F. Siegert and F. G. Hess is of considerable interest. We 
follow the presentation of Hess. 

The state of our system in the solution of Section 10 was de- 
termined by the number of balls in the box A. With this definition 
of state the probabilistic model became a Markoff chain with 
transition probabilities given by the matrix Q. 

Hess and Siegert propose a more complicated definition of the 
state of the system. With their definition, the process is still a 
Markoff chain, but the matrix of transition probabilities becomes 
extremely simple, and the diagonalization can be performed al- 
most immediately. 

According to Siegert and Hess, the state of the system is de- 
fined by specifying the exact location of each ball.26 Introducing 


the vectors 
l 0 
g= B and p = H 


2 Those familiar with the terminology used in statistical mechanics 
will recognize that we are dealing here with ‘“‘microstates’’ while in the 
previous solution we dealt with ‘‘macrosiates.”’ 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 91 


we now represent the state by the tensor product 
N = & X & X... X ER 


where each e is either « or 8. «œ corresponds to box A and f to B 
so that, for instance, 


EXXX XE 


corresponds to the state in which al? balls except number 2 are in A. 
Introduce now the matrix 


which has the property that 
St 0, o =o 
and consider the tensor product 
SSL ST Kae KO Mane Xd 
(I is the 2 x 2 identity matrix). 
By definition 
SN Se Xx aan XSi X caw XK Oop 


and so operating with S,on 7 is tantamount to moving the tth ball 
from wherever it is to the other box. 
It is clear now that 
2R 
(QR) F S;)*7 

is a linear combination of 2?” vectors?’ corresponding to all distinct 
states (in the new sense) of our system. The coefficient of 7, in 
this expansion is then clearly the probability 


P (nin; s) 


that starting in state 7 we shall find the system in state ņ, after s 
steps. 
If we define (as is common) the dot product of 


27 Actually tensor products, but we shall refer to them as vectors 
without fear of confusion. 


92 PROBABILITY IN PHYSICAL SCIENCES 


Ge X EX cn X Op and OS] 8, Xt KX one X Ep 
by the formula 


we can write 
2R 
P (nl; s) = N [(2 R) 2o] 


It is now easy to convince oneself that 
P(n|m; s) = 1C, $ P(nim:; s) 
where the summation is over all 22C, states y corresponding to 
having n balls in A and over all 22C states 7, corresponding to 
having m balls in A. 
It is now clear that all one has to do is to diagonalize the 
operator 


QRS S, 


i=] 


The normalized eigenvectors of S are 


1 1 
— 9-4 — 9-4 
= 2 H and v = 2 4 


(the eigenvalues being +1 and —1 respectively) and clearly 
a= 244+), B= 2 (u — 1) 


Furthermore, it is clear that 
( (2R) a Dj. Oy X ra X OR 
where each ô, is either 7 or vy, 1S 
(IR) 4 27.0) X rX wax X Oop 


where 27 = number of ws — number of vs in 6, X... X dap. 
Thus 


is an eigenvalue of our operator (of multiplicity: 2*C,,,) the 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 93 


corresponding eigenvectors being 
6, X 0g X... X Oop 


with exactly R + f p's. 

It is now a matter of routine to rederive (III.10.14). 

Apart from its elegance this derivation contains a suggestion 
which may prove to be of considerable generality and importance. 

Ordinarily the complexity of a problem increases when one 
asks for more details. Moreover, too detailed a description may in 
itself be quite irrelevant.2® The Hess-Siegert method shows that 
this need not always be the case. 

It is contraction (or throwing away information) that com- 
plicates the problem! 

The relations between detailed and contracted descriptions 
are at the very heart of statistical physics, and their proper 
elucidation is still far in the future. 

A final remark is in order. In general, by lumping states into 
groups and treating the groups as new states we destroy the 
Markoffian character of the process.”® 

In fact, one must often “unlump”’ to recover the original 
Markoffian process. 


12. Armed with formula (III.10.14) which gives a complete solu- 
tion to the Ehrenfest problem we go back to the final point of 
Section 9. 

First note that as soo all terms but two in (III.10.14) go to 
0. The ones which do not, correspond to 7 = R and 7 = — R. 
The term corresponding to 7 = R is 


(—1)"2-9 CP CHRM = ARC, 228 = W (m) 
and the term corresponding to 7 = — R is 
(— 1) (—1)r2-#8 CO CE = (1) W (m) 


28 In this connection see the special lectures by Uhlenbeck in Appendix 
I of this volume. 

29 Failure to realize this leads to errors. An otherwise excellent review 
article of S. Chandrasekhar in Rev. Mod. Phys., 1943 is somewhat marred 
by erroneous “‘lumpings” in many places in Chapter III. 


94 PROBABILITY IN PHYSICAL SCIENCES 


Thus 
D(m, s) = > Din; 0) P(n|m; s) 
~ Wim) + (—1)”+: W (m) 2 UD 0) 


Clearly if D (n; 0) is slowly varying the second term is small, 
especially when R is large. 

Strictly speaking, even coarse-grained D(m, s) does not ap- 
proach W (m) but the error is roughly of order R~ and can be safely 
neglected. 

From (III.10.14) one can calculate various averages although 
these can be also obtained directly without the use of (III.10.14). 

One obtains 


(nals) — R> = (m — R)( — RH 
(nals) — R)? = (m — R}? (1 — 2R4) 4+ = [1 — (1 — 2R) 
Thus 
<(nals) — R> — nals) — R>)? 
= (m — RP[( — 2R) — (1 — R44 [1 2k 
and finally, 


ge — <(tals) — R — nals) — >> 
i (nals) — R>? 
R|1 — (1 — 2R7)'] (1 — 2R-1)s 
~~ 2 (np — R)2(1 — R» (le oan 
The quantity we have calculated is the square of the so-called 


“relative fluctuation,” and it measures the stability of the mean. 
Indeed, by Tchebysheff’s classical inequality 


; ; nals) —R E . 
Ke Prob | Zn, (s) RS TR 1> 
and hence 
| na(s) R : 
Prob || 747 = R5 = l > el <a 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 95 


If no = 2R and s~ xR we get 


g? ~ (exp (2x) — 1)(2R)4 — sR- 
and hence ¢2/e2 is small so long as e2R> 1. 

If no = 2R and s ~ R?, $3 becomes enormous (roughly of the 
order (2R)~1 exp (2R)) and the inequality becomes useless. It can 
actually be shown that in this case large deviations from the 
average are quite probable and the “average’’ curve 


R(1 — R)’ 


becomes meaningless. On the other hand, as mentioned in Section 
7 the mean recurrence time of the state nọ = 2R is 2%" and 
R2 < 22%, It is thus incorrect to maintain that irreversible be- 
havior will, in general, be observed for times short compared to 
the Poincaré cycle. 

It is true that it is highly improbable for state nọ = 2R to 
recur within the time of the order R?, but to expect that some sort 
of monotonic behavior will persist during all that time is the height 
of optimism! 


13. The Markoffian character of the Ehrenfest model (as made 
evident by (III.10.4) makes it possible to discuss the recurrence 
time problem in considerable detail. Let 


P’(n|m; s) 
denote the probability that after s drawings, m balls will be ob- 
served for the first time if there were n balls in that box at the 
beginning. 
We then have the following relation 
s—1 


(III.13.1) P(n|m; s) = P'(n|m; s) + > P' (nim; k) P(m|m; s — k) 


To see this we note that starting from n we can reach m in s 
steps in the following s mutually exclusive ways. (a) m is reached 
for the first time after step s. (b) m is reached for the first time in 
1 step and (starting from m) it is reached again in (s — 1) steps. 
(c) m is reached for the first time in 2 steps, and (starting from m) 
it is reached again in (s — 2) steps, etc. 


96 PROBABILITY IN PHYSICAL SCIENCES 


This separation into mutually exclusive possibilities explains 


the sum in (III.13.1). 
The Markoffian character of the process is responsible for the 


product 
P’(n|m; k) P(m|m; s — k) 
which is simply the probability that m is reached (starting from n) 
for the first time in k steps and then (starting from m) reached 
again in (s — k) steps. 
Introducing the generating functions 


h(n|m; z) = 5 P' (nim; s)z8 


s=1 


g(n|m; z) = 5 P (nim; s)z28 


we see that (III.13.1) is equivalent to 
(III.13.2) g(n|m;z) = h(n|m; z) + h(n|m; z) g(m|m; z), or 
h(n|m; 2) = g(n|m; z)/(1 + g(m|m; z)) 
Since P(n|m; s) is known (see (ITI.10.14)) we know g(n|m; 2) 
and hence, in principle at least, we know h(n|m; z) and P’(n|m; s). 
Actually, explicit formulas for P’(n|m;s) are very difficult to 


obtain. 
The following results are however almost immediate: 


h(n|n; 1) = > P' (n\n: Tek 


s=1 
dh : a 
lim a) L Ș sP'(nin; s) = 1/W, = 22878C, — 6% 
2-1 Z s=1 


These are already known to us from a much more general 
point of view. 

However, now we can go further and calculate, for instance 
the variance of the recurrence time: 


(s — 67)? P’(n|n; s) = A, 


M8 


wn 
I 
= 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 97 


Without going into details we mention that for n = 2R 
Asr ~ (23%)? = (03r)? 
which shows that recurrences of states far from equilibrium exhibit 
such violent fluctuations that the mean recurrence time has almost 
no meaning! 

This is another manifestation of the phenomenon discussed 
at the end of Section 12. 

To conclude our treatment of the Ehrenfest model we discuss 
briefly the notion of entropy as it applies to the model. We follow 
the presentation of M. J. Klein. 

There are two definitions of entropy one due to Boltzmann 
and the other to Gibbs. 

Boltzmann defines entropy as the logarithm of the number of 
microstates which realize the given macrostate (actually k times 
the logarithm, where k is the Boltzmann constant, but this is not 
important for the artificial model under consideration). 

In other words 

Sa lop MCa 

Gibbs on the other hand defines entropy in terms of P(m; s) 
(i.e., the probability of finding the system in state m at time s). 

His definition yields 
(III.13.3) Sg = — ¥ P(m; s) log [P(m; s) 2-28/W(m)] 


where W(m) is given by the formula: 
(III.13.4) Wn) =E 2 R 


Observe that (III.13.3) is quite analogous to (III.9.0). 

What about the two notions in equilibrium? 

It should be noted that the meanings of equilibrium are quite 
different for Boltzmann and Gibbs. 

To Boltzmann equilibrium is a state of maximum entropy, 
Lea 

m= R 
Thus 
Sg: ea = log CR ~ 2R log 2 


98 PROBABILITY IN PHYSICAL SCIENCES 


(if one neglects terms of order log R and higher). 

To Gibbs equilibrium is not a state but a distribution. In 
other words, Gibbs obtains properties of a system in physical 
equilibrium by averaging over all possible states; but, the averaging 
is with respect to a probability distribution for which the Gibbs 
entropy is Maximum. Thus 


Sa; eq = Max Sg 


and it is seen (by repeating the argument at the end of Section 9) 
that the maximum is achieved for 


Pim; s) = W(m) 
and hence 
S@:eq = 2R log 2 


Thus in equilibrium both definitions coincide and only in 
behavior with time are the two entropies different. 

The Gibbs entropy increases with s (time). This is a 
simple consequence of convexity of xlogaz and the fact that 
P(m;s + 1)2-22/W(m) is a linear combination 


S c, P(n; s)2-8/W, 


with c, 20 and $ c = 1. 

The Boltzmann entropy behaves in a much more erratic way 
(as exemplified by the graph on p. 74) and only on the average can 
it exhibit monotonic behavior. 

Incidentally, we see that the monotonic behavior of Sg does 
not imply that P(m;s) (better P(n|m;s)) approaches W (m). In 
fact, we know that P(m; s) does not in the strict sense, approach 
W (m)! 

All this points to two quite different formulations of the sec- 
ond law of thermodynamics. 

In one formulation (Boltzmann), the notions of state and 
entropy are quite intuitive but the monotonic increase of entropy 
cannot be strictly maintained. 

In the other (Gibbs) the increase of entropy is a rigorous 
theorem but the notions of entropy and approach to equilibrium 
are much less intuitive and direct. 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 99 


14. We have discussed the Ehrenfest model in such detail because 
it furnishes an excellent introduction to statistical mechanics 
of irreversible phenomena. However, the model does not help 
us to understand the more basic problem of how probability 
enters classical mechanics. 

To throw some light on this question we shall dicuss another 
artificial model which in spirit is much closer to “reality.” 

On a circle we consider n equidistant points, m of which are 
marked and form a set called S. The complementary set (of 
n — m points) will be called S. 

Each of the n points is a site of a ball which can be either white 
(w) or black (b). During an elementary time interval each ball 
moves counterclockwise to the nearest site with the following 
proviso. 

If the ball is in S it changes color upon completing the move 
but if it is in S it performs the move without changing color. 

Suppose that we start with all white balls; the question is what 
happens after a large number of moves. 


14a. Analog of the Classical Solution of Boltzmann. Let 
N,,(t)(N,(¢)) denote the total number of white (black) balls at time 
t (i.e., after £ moves; ¢ being an integer) and N,,(S; t) (N,(S; ¢)) the 
number of white (black) balls which are in S at time ż. 

We have the immediate conservation relations: 


N, + 1) = N,(t) — N,(8; t) + NZ; t) 
EEN EE = N,() — N,(S: 8) + Ny (S:2) 


Now to follow Boltzmann, we introduce the assumption 
(“Stosszahlansatz’’) . 
N, (S; t) = mn™N „(t 
(I11.14.2) old; t) = ma Nol 
N,(S; £) = mn, (2) 
and obtain 


N(¢+ 1) —N,(¢ + 1) = (1 — 2mn)(N,(¢) — N, (4) 
Thus 


100 PROBABILITY IN PHYSICAL SCIENCES 


(I1T.14.3) 9 2 4[NV,(¢)—N, (¢)] = (1—2mn)* nN, (0) —N,(0)] 
= (1—2mn—1)! 

and hence if 

(III.14.4) 2m <n 


(as we shall assume in the sequel) we obtain a monotonic approach 
to equipartition of white and black balls. 

This conclusion is clearly untenable because our model is 
completely “reversible.” In fact starting with all white balls 
(N,(0) = 0) we continue for a while and then reverse the colors of 
the balls; let them move clockwise and move all elements of S one 
unit counterclockwise. We shall arrive then at the initial state 
contrary to (III.14.3) which ought to hold for the “reversed” 
model as well as for the original one. 

Moreover, the model is strictly periodic with period 2n 
(Poincaré cycle) which is again incompatible with (III.14.3). 


14b. Probabilistic Treatment. It is perfectly clear that the 
origin of this difficulty is in formulas (II].14.2). These formulas 
must be reinterpreted and their meaning analyzed more carefully. 
Let 
+1if the ballat #(1 <p <n) at time tis white 


(111.14.5) %3) = i — 1l if the ball at p at time f¢ is black 


and furthermore 


(III.14.6) c, = in cos 
Then 
(Ii. 14.7) Np) = Npa — 1) €p 
and hence 
(III. 14.8) Np (t) = Np—(0) Ep-1 8p as - - Ep—t 


(It is clear that p — k should be taken modulo n.) 
Thus 


Nalt) —N,(t) = > 1, (t) — > np—(0) Ep—1 Ep—2 » » > Ep—t 


Pp 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 101 


and in the simplest case N,(0) = 0 (i.e., 7,(0) = 1) 
(111.14.9) nN (t) —N,()] = nY 6, 1&-9- -Ep 
If all the information about the set S is that it contains m 


elements (i.e., >, €, = n — 2m) it seems natural to investigate 
(III.14.10) ae ee ee 

D 
statistically. 


Assuming al] possible positions of the set S to be equiprobable, 
we can calculate the average of (III.14.10) by the obvious formula 


(TILA) <n T S eatro tip = Kec sep 
? eo eee On 
where the prime on the summation sign indicates that the auxilliary 


condition 
> Ep = N — 2m 
p 


is to be taken into account. Noting that 
| Jz 1 if > êr = n — 2m 
(2m) ——— n = l 
c exp, (2m — n+? e, +1) 0 if $e, £n — 2m 
p D 


(C is a simple closed curve encircling the origin) we see that 
dz EyE... E 
oe = (mia ones. ean 
2 ê p me cexp,(2m—n-+1) eae ar gt... F En) 
where the summation inside the integral sign extends now over all 
possible choices of the e’s. Thus 
faceti 
e CESE EA) 
and using (III.14.9) (III.14.11) we obtain 


= (2 + 271) (et — z) 


(III.14.12) (271) cae (1 — 2) (1 + 2) (22 + 1)"2-2™ dz 


(27i)? RG Coua 


102 PROBABILITY IN PHYSICAL SCIENCES 


If we keep ż fixed and let n and m approach infinity in such a 
way that 
(IIT.14.13) lim mni = u < 4 
we can apply the method of steepest descent (easily justified in this 
simple case) and obtain 
(II1.14.14) lim <n- (Nalt) — N,(#))> = (1 — 2p)! 
which justifies (III.14.3) “on the average.” 

A slightly more complicated calculation will show that the 
variance of u1(N_,,(t) — N,(¢)) about the mean will approach 0 
(as n~!) in the limit considered here. 

Finally, it can be shown that 


(III.14.15) lim <niN, (S, t) = u lim <n“1QN, (t)> 


which justifies “on the average” the “‘Stosszahlansatz”’ (III.14.2). 

Now let us think a little as to what it all means. 

Suppose we plot n™!(N,(t) — N „(t)) against ¢ for each set S. 
We obtain "C,, “curves”; all start from 1 at ¢ = 0 and all have 
period 2n. 

Suppose we are going to look on these curves at a fixed point 
t&n. Think of n as being of order 1073 and ¢ of order 10°. Att 
the ordinates of all curves concentrate very strongly about 
(1 — 2u)'—it takes extremely bad luck to observe a sizeable devia- 
tion from (1 — 2u)'! 

As convincing as this argument is, it should be remembered 
that it contains an element of arbitrariness in the assumption that 
all positions of the set S are equally probable. Although undoubt- 
edly this assumption can be weakened, the fact remains that some 
such assumption must be made. 

In the Ehrenfests terminology, the curve (1 — 2mu)* is the 
“curve of the H-theorem” (while each individual “curve” 
n—(N,(t) — N,,(t)) is an ‘“H-curve’’) or “Verdichtungskurve.” 


15. The “ring model” of Section 14 can be discussed further and 
used to clarify the scheme described at the end of the first part of 
Appendix I by Uhlenbeck. 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 103 


Before doing this we modify the model slightly in order to facili- 
tate calculations. 

In the preceding formulation, we kept the number m of ele- 
ments of the set S fixed. We now relax this condition by the 
following device. At each of the n points we perform an experi- 
ment (toss a coin) whose probability of success is u < 4. Accord- 
ing to the outcome of the experiment (success or failure), we assign 
or do not assign the point to the set S. 

In other words the e’s are not fixed but are “random variables” 
satisfying 
(III.15.1) Prob {e, = — 1} = u Probf{e, = 1} =1—yp 
We assume furthermore that the experiments (tosses) are perform- 
ed independently so that the e’s are “independent random variable.” 

The number of elements in the set S is thus 


m— $n — Ye) 


which again is a random variable. 
The average number of elements is 


my gln — 2 &) = un 
and the variance about the mean is 
<(m — <m>)*> = <(m — un)”> 
= (È (e; — (1 — 2u))*> = u(1 — n)a 
Thus m deviates from un by an amount which in an over- 
whelming number of cases is of the order 4/n, and we can con- 
fidently expect that the results will not be altered by allowing this 


slight leeway. 
On this modified model we have 


l l 
(Nalt) — N,(t)) >= — 23 
= (Eby... E) = LEX Cbg). . <E = (1 — 2u)’ 


in complete agreement with the former result. 


104 PROBABILITY IN PHYSICAL SCIENCES 


The calculation is now much easier and the asymptotic 
evaluation of (II1I.14.19) is circumvented. 

This trick is well known in statistical mechanics. Those fa- 
miliar with standard terminology will note that we have replaced 
the ‘‘microcanonical ensemble” (defined by the rigid constraint 
that the number of elements in S be m) by a “canonical” (or better 
yet “grand-canonical’’) ensemble in which the constraint is suit- 
ably relaxed. 

What is the I-space of this model? It is clearly the set of all 
“vectors” 

N = (M, Nes +--+ Mn); poe 1 
and consists of 2” points. 

If 

p(n; 9) = pm., -< nn; 9) 


is an initial distribution 
(e(n; 0) = 0, => p(y; 0) = 1) 
n 


and if we know the position of the set S we can write the analog of 
Liouville’s equation. It is 


(II1.15.2) p(s- -Nani t + 1) = plerme, Eùs» <- Enit) 
We can write this symbolically 
p(t + 1) = Lp(t) 


where L, the “Liouville operator”, is a unitary operator. 
Now we can write formally 


p(t) = L' p (0) 
and averaging over all sets S 


(I11.15.3) <p(t)> = <L*> p(0) 


In general, this is very difficult to perform so the physicist replaces 
(III.15.3) by the equation 


<p(t)> = <L>*p(0) 


It is this second approach that leads to what we call nowadays 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 105 


the “master equation.” Let us analyze what is involved here. 
We give up the description in terms of the operator L and 
replace it by a purely probabilistic description as follows. 
If at time ¢ the model is in state d it can perform a transition 
to a state 7 (during an elementary time interval 1) where 
Oy = & N23 0, = 23, ++ +) On = En M 
The probability of the transition d—7 is the probability that 
Ey = 01 No1 Eg = 02g, -o En = Ôn M 
Since the e’s are independent and satisfy (III.15.1), we get 
P (d\n) = Prob {e = Ô N2» e. e En S Ont} 


= -TI {4 + (1 — 2u) On Ns} (Hatt — nı) 


k=1 


(III.15.4) 


We now replace the Liouville equation (III.15.2) by the “master 
equation” 


(III.15.5) d(yn;t+ 1 —2 (ð; t) P(d|n) 
with the initial condition 
(III.15.6) 6(0; 0) = p(d; 0) = p (ð) 


The symbol ¢ for the distribution is used because (y; t) is not 
necessarily the same as p(y; t). The “master equation” can be 
solved easily by noting that (y; 0) = p(ņ) can be written uniquely 
in the form 


(111.15.6) (n; 0) = p(n) 
=2"+ Sem + > Coati F eea TON Birn ni2- -Ny 


lsk<lsn 


Now, 


(n; 1)= 2-"-+- (1—2u) $ cunat (12u)? > Ck, Nri F- 


nsk<lsn 
and, in general 


(TIT.18.7) qs t) = 27 + (1 — 2)" È cotta 


ile 20 > eae 2-5 


l<k<lsn 


106 PROBABILITY IN PHYSICAL SCIENCES 


It should be noted that the ‘‘master equation” is the complete 
analog of the equation 
(I1I.15.8) P(n|m;s + 1) = P(n|m — 1; s)Q(m — 1\m) 

+ P(n|m + 1,5) Q(m + 1m) 
which governs the temporal evolution of the distribution in the 
Ehrenfest model. 

The main difference is that while in the Ehrenfest model the 
equation (III.15.8) is a rigorous consequence of the underlying 
probabilistic assumptions, in our model the “master equation” is 
an ad hoc assumption suggested by the Liouville equation and 
intuition. 

The fundamental problem is thus to justify the probabilistic 
treatment on the basis of the Liouville equation (III.15.2). 

Fortunately we have an exact solution of (III.15.2) namely: 


(II.15.9) pty. -nni t) = pL. .  EpEp41 - - + Ept- -3 0) 
a oe) CaN TT eae 
p 


aa > Cy a Nnt+t Na+t Ep 3 Ep+t—1 Eg Ra Eq+t—1 F vo 
p<@ 


Thus: 
(III.15.10) y(n, £)=<p(n, t)>=E{p(n, £)}= 2" + (1—2u)" Dey yee 


ar > Coalo Nae, ee Êptt—a Ege -- Eqit-> gas 


p<@ 
This ts not exactly what one gets by solving the master equation. 
In fact 
(LISI) 46, caiepg ie ea (a 
where 
2t, =p 


and we see that certain terms fail to die out. 
To clarify the situation let us restrict ourselves to symmetric 


(III.15.12) A(p,q;t) = 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 107 


initial distributions, 1.e., 
Chi = 1,9 
etc. 
Formula (III.15.10) now assumes the form 


(III.15.13) (n; t) = <p(n, t)> 


=2-"+ (1 — 2) nN, + C1 2 > (l= a a, 
p 


l1sp<qSn 
eet (HDi, aM Mars Mn 


It is seen that not only do certain terms fail to die out but even the 
initial symmetry is destroyed. 
Let us consequently symmetrize y(n; t). We obtain 


(II1.15.14) S{p(q, #)} = 2-" + (1 — 2u)'q Sn, 


Dp 


+ 1/"Cy cy 9 > (1 — 2u)4?o") 2 ane T 


lsp<qsn lsp<qsn 
+ (-1)fepe nM Me+ ++ Mn 
For £ fixed and n large 
1C, (1 — Bu) m (1 — 2p)” 
LSp<asn 
etc. and the agreement with the solution of the master equation 
becomes visible. 
There remain terms like 


(—1) een MN + -e Na 
which do not die out even after symmetrization, but this is not 
serious. If the initial distribution is smooth “high harmonics” will 
be absent (or nearly absent) in the expansion (III.15.6). In other 
words one must deal with “sufficiently coarse-grained” distribu- 
tions. Our discussion brings out another difficulty of the “master 
equation” approach. 
It is clear that the error commited in replacing 
E o” > (1 — 2 uR 


lsp<aqsn 


108 PROBABILITY IN PHYSICAL SCIENCES 


by 
(1 — 2u)” 
is of order 


n/"Ce 
Similarly the error commited in third order terms is of the order 


"C2/"C3 
etc. 
Individually these errors approach 0 as n—>œ, but tn toto this 
need not be the case unless the c’s fall off with sufficient rapidity. 
We are thus led again to the necessity of assuming that the 
initial distribution is sufficiently coarse grained. 
A few words ought to be added to explain the meaning of 


symmetrization. 
Consider the contracted distribution 
(III.15.15) Hy» Nai t) = È” p(y; t) 
n 


where the double prime on the summation sign indicates that one 
sums over all 7, except 7, and 7,. 
From (III.15.13) one obtains easily 


(III.15.16) F(p, Na; t) =F + (1— 2u)" a(n, + na) 


+ c(l — OT P h ta 
or in other words 


Prob {n, (£) = @, Nalt) = œ} = 4 + (1 — 2u)’ ci (o + w) 
+ c(l — 2u) S wowa (1, @, are either +1 or —1) 


Thus the joint probability at two specified points will not approach, 
as t->00, the limit consistent with the equilibrium distribution. 
However the joint probability at two (or more) specified points is 
irrelevant. 

What one wants is the joint distribution at two points chosen 
at random. This joint distribution is precisely the symmetrization 
of f(w,, w; t) (assuming, of course, that all pairs (f, q), p < qg,are 
equiprobable) and one obtains now: 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 109 


S {f (œ, Wg; t)} = $ + (1 — 2u) (0 + w) 
(1 — Di) aat 
$ Cm lsp<aqsn mA We 


Ca 
which in the limit noo (t-fixed) becomes 
z+ (1 — 2u) cilo + w) + c(l — 2)?’ a, w 


in agreement with the solution of the master equation. 

If one may be permitted an analogy (admittedly far fetched) 
with kinetic theory, one would say that in Boltzmann’s theory 
f(v, va; t) should not be interpreted as the joint probability of 
velocities of two specified molecules but as the joint probability 
of velocities of two molecules chosen at random. 


16. An approach to Boltzmann’s theory for a monoatomic dilute 
gas without mass motion (1.e., spatially homogeneous) can also be 
based on a master equation. 

Denote by v,,..., V, the velocities of the n molecules and 
combine them into a “master vector” R (3n-dimensional) 


(III.16.1) R = (vy... Vn) 


Consider now the process in which during time dt a “collision” can 
occur between the 7th and jth particles (t < 7), while the direction 
of the center line (that is, the line joining the centers of the zth and 
7th sphere, in the direction from z to 7) is] within dl. The probabil- 
ity that such a collision takes place is assumed to be of the form 


(ITI.16.2) y,,dldt = y( (v; — v,;)- 1, |v; — v,|) dl dt 
For the case of hard spheres 
(111.16.3) pa = 8/ V3 (| (V; — vi) -1] — (v; — v,) - 1) 


which corresponds to Boltzmann’s “‘Stosszahlansatz.”’ 
If an (z,7,1) collision takes place R changes into 4,,(I)R, 
where 


110 PROBABILITY IN PHYSICAL SCIENCES 


(I11.16.4) A)R = (v,,....V, + V — v,)-°UL,..., vy; 
— (v; —v,):U,...,v,) 


otherwise R remains unchanged. 
Thus we can say that 


R->A,,(1) R with probability y, dldt 
R-+R with probability 1— dt > fdly; 


1lsi<jsn 


Since each collision preserves momentum and energy, we have 


(III.16.5) 


mM 


v; = constant 30 


n 


(III.16.6) > v;° v; = constant = no? 


j 
1 


We may as well set the constant in (IITI.16.5) equal to 0. 

Thus R is always confined to a (3n — 3)-dimensional sphere 
S,(o) of radius oy/n. If, at time? = 0, we start with a distribution 
of points R given by the density (R; 0), it is easily seen that this 
distribution will evolve in time according to the equation. 
mmen PEP S SAHAR: A) — HR hy 

1Si<jsn 

This is the master equation which is the analog of (III.15.5) 
for the ring model as well as (II1.15.8) for the Ehrenfest model. 

The derivation of the “master equation” (III.16.7) with y,, 
given by (III.16.3) embodies the basic assumption (““Stosszahl- 
ansatz”) of Boltzmann. Yet (III.16.7) is near while Boltzmann’s 
equation (III.1.6) is not. Clearly, to pass from (III.16.7) to (III. 
1.6) additional assumptions are needed. 


30 This is incompatible with the presence of the container, since col- 
lisions with the walls do not preserve momentum. This is, however, a 
minor point which can be circumvented by assuming that whenever a 
molecule collides with the wall, it is reintroduced somewhere in V without 
change of velocity. The origin of this slight difficulty is the fact that the 
reduced Boltzmann equation is not strictly valid inasmuch as the container 
introduces an exterior force. 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS lll 


In order to discuss these assumptions as well as to exhibit 
more clearly many other points we shall construct a simplified 
mathematical model which embodies many (if not all) of the 
essential features of our problem. Let 


(III.16.8) R= (G52 Zs) 

be subject to the condition 

(III.16.9) (Ri2=ai+...+2=n 

and let 

(I1I.16.10) A,,(@)R= (t... £; cos 0 + z; sin 6,..., — x; sin 6 
+ x, cosb, ..., £p) 


Let furthermore 31 


y 
Pi; =- = const 
” 27n 


The “master equation”? now assumes the form 


(I1I.16.12) dE a Es "(4 (4,9(6)R:t)—$(R:t)}d8 


at N i<icjsn20J0 


and the analog of (III.1.6) is 


Of (a, t) vo i 
IIT.16.13 = dy — 
l ) at "Jao "n 


{f(x cos 6 + y sin 9, t) 


-f(— xsin 6 + y cos 0, t) — f(x, t)j(y, t)} dé 


The changes made are: (a) we dropped the conservation of mo- 
mentum (III.16.5); (b) we simplified the form of y,,;; (c) we re- 
placed the more complicated six-dimensional rotations 4,;(1) by 
two-dimensional rotations A,,(6). 

Let us now assume that ¢(R; 0) is symmetric in all variables 
Zis... Ep It then follows that ¢(R; 7) is also symmetric. 


j 

31 Somewhat more generally we could set y,; = vf(0)/n where 
f(—0) = f(6) (“microscopic reversibility”) and JEZ 4(6)a6 = 1, f(@) 20. 
The theory would then go through without any serious modifications. 
The more general theory is analogous to the theory of the Maxwell gas. 


112 PROBABILITY IN PHYSICAL SCIENCES 


Let us now introduce the following abbreviations 


(111.16.19) æ= | (RA) doy, 


xe+. +a? =n—a? 


(I11.16.15) f(a, y; t) = Í $(R; t) do, 
at... +2? =n-—g?—y? 
etc. 

The integrations are over spheres indicated under the integral 
signs, the free uate being replaced by 2, y, etc.32 The density 
functions /”, /”,... will be referred to as contractions of ¢, fY” 
being the k-dimensional contraction. An easy calculation on 
(III.16.12) yields 


(et) n= 1) ae 
——— = — 9 
( 


ITI.16.12 d 
l ) ot n 4 


n—g?)? 
l 


gm) TEER] —g sin 0 + y cos 0, t) — {P (a, y, t)}d0 
x 


which is strongly reminiscent of (III.16.13). To get (III.16.13) 
one need only assume that 


(I11.16.17) f(a, y, t) ~ AY (wy, t) A” U, t) 


for all x, y in the allowable range. One is immediately faced with 
the difficulty that since (R; t) is uniquely determined by ¢(R; 0) 
no additional assumptions on ¢(R; ¢) can be made unless they can 
be deduced from some postulated properties of (R; 0). 

A moment’s reflection will convince us that in order to derive 
(III.16.13) the following theorem ought to be proved first. 

Basic THEOREM. Let ¢,,(R; 0) be a sequence of probabiltity 
density functions defined on spheres ||R||? = è +... + a6 =n 
and having the “Boltzmann property” 


k 
(III.16.18) lim {® (x, cn £y; 0) = [[ lim £” (z;, 0) 


n> j=1 N—- OO 


32 Tt is, of course, understood that do,, do, ... are defined appropriately. 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 113 


Then the ¢,,(R, t) (thatts, solutions of (III.16.12)) also have the“ Boltz- 
mann property”: 


k 
lim (tra ete] ) time) 
n> j=l n= 
In other words, the Boltzmann property propagates in time! 
It thus appears that the nonlinear character of Boltzmann’s 
equation (III.16.13) is due solely to the extremely special assump- 
tion which the initial distribution has to satisfy. The basic theo- 
rem elucidates what was previously hidden under the assumption 
of molecular chaos. 
The question arises whether there are density functions having 
the Boltzmann property. 
Without entering into details, let us state that if c(w) = 0, if 
for some positive z 


c(x) exp (— zz?) 


is integrable, and if c(x) is subject to mild regularity conditions 
one can show that 


[] c(z,)do 


S, j=1 


has the Boltzmann property. 


17. We now give a proof of the basic theorem under the assump- 
tion that ¢,(R; 0) is square integrable on S,. This assumption 
guarantees that ¢,(R; 0) is sufficiently coarse grained. 
Consider the Hilbert space of square integrable real functions 
yn(R) defined on the sphere S,, || R|? =n, and the linear 


operator {2 
1 l 27 

(1.17.1) Qp,=— =| fyn(4(0)R) — p, (R)} 45 
N 1si<jsn Nvo 


It is easily verified that 
(I11.17.2) (Opn Xn) = (Yrs 2xn) 


114 PROBABILITY IN PHYSICAL SCIENCES 


and that 
(III.17.3) (WPa, Wn) 


l 27 
ae xt, Je TAP! — Y, (R)? do d0 


The operator 2 is thus self-adjoint and clearly bounded (though 
the bound may, and indeed does, depend on n). 
Thus we can write 


(II1.17.4) d, (R; t) = 


Let g(R) = g(x), be a bounded function of only one variable. We 
have from (III.17.2) and (III.17.4) 


(I1.17.5)  ($,(R: 2), y=]. $,(R; t) g(2,)do 

- ro P(e, t)g(a)de = OF (Org, 4, (R; 0) 
Now, 
(III.17.6) Qg = — >. : tel (x, cos 9 + x, sin 6) — g (x;)} d0 


and setting 


l 27 
(III.17.7) g(x, y) = | {g(x cos 0 + y sin 0) — g(x)}d0 
TJO 


we have 

(III.17.8) Qg => a Ly, X;) 
iS 

Further 

(III.17.9) Q?g = = RA x;) 
< 


and 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 115 


l l 27 
(III.17.10) Qg (£i £a) = — Pi ga (x cos 0 + a, sin 90, 
n 2r 


— z sin 0 + 2x, cos 0) — ga(£1, £a) } do 


IMs 


Lio a 


ga (xı cos O + x; sin 0, £a) — go(%,, £o) } d0 


oe {ga (£i, % cos O + x, sin 0) — g,(%,, v_)}dd 


ste ahs 


IMs 


Since ¢,,(R; as = 0 and 


(11.17.11) Is. ¢,(R; 0) do = 

and since |g(R)| < M, we have |g,| < 2M and hence 
(III.17.12) | (2g, ¢,(R; 0))| < 2M 
Furthermore 


(I1I.17.13) |Qg,| < 4M/n + 4M - 2(n — 2)/n < 214M 


and hence 

(I1I.17.14) |? 2) < 2! 22M 

In general, 

(III.17.15) |Q* g, ġa (R; 0)| < k! 2M 


The foregoing proof is applicable only for k < n. However, 
the formula remains valid for k > n, since 


p = O Qe) Sn nl 2M < k!2*M 
Moreover, 


(III.17.16) lim (Qg, ¢,(R; 0)) = ff f(x, Y) 8ga (x, y) dx dy 


— 09 


where 


(III.17.17) falx, y) = lim ff" (a, y, 0) 


n->O0O 


116 PROBABILITY IN PHYSICAL SCIENCES 


and in general 


(III.17.18) lim (Q*g, ¢,(R; 0)) 


n> 


e [e Vya + +) Uga) "Eralp -- ss p44) dy... AL y+ 


where the g,,, are defined inductively as follows: 


g(x) = g(x) 
III.17.19 
| . Bess (Tis -+ ++ Lega) 
aN {gu ( Ly, +++, Zj COS 6-+-2,,,51n 6, . Ly) — 8r (%y; £p) y dO 


Since we have assumed that ¢,(R; 0) has the Boltzmann property, 
we have 


(III.17.20) lim (2g, 4, (R; 0)) 


=f. an Uy). ++ F(Er41) * Sepa (My + +s Erpa) My... dpt 
where 
(III.17.21) f(x) = f(x; 0) = lim f{”) (x; 0) 


From (III.17.5), (I11.17.15), and (I11.17.20) it follows that for 
0<t< 1/2» 


+00 
(III.17.22) E fz, t)g(x)d 2 sd Ff. Jia (t) nad ei) 
$ Dig Uye oas Eyar) dissa ag 
where 33 


et) =h =la 


n—0o 


33 It should be clear that the limit is to be understood in the weak 
sense in the space L(— œ, + œ). 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS J17 


Starting now from a function y,(%, £2) = g(x,)h(x,) and defining 
v,(%1,..-, Zy) Inductively by formula (I11.17.19) we obtain again 
for 0 <t < 1/2» 


(11.17.23) J tele Xo; t) g (21) h(x) dx, dx, 
co (yt k 7? 
== 2 sl i . fre s.s f (Er42) Vrat ney Eryo) d a AX, 9 


It is easily checked that 
(I1I.17.24) yg (ay, Lo, £3) = My (L2)82 (L1, £3) + 84 (21)ha (Xp, £3) 
(I11.17.25) yg, Lo, Lg, L4) = Ao (Lg, La) Sp (Ly, Xs) 

+ fy (22) E3 (£1, Lg, £4) + Bo (wy, L4) My (Xp, L3) + gi (21) Aa (Las L3, L4) 


etc. 
Thus, for instance 


(III.17.26) J| ig Hihera a Dy Voy Vay By) dtndan 


= (fre h(x) dx i Fs) gE os Be) eae des) 


+ (İf (1) (a) Gala) dæ, dn) (IF renies a) den da) 


+ E az) (Jij rereset (Ty, La, Uy) dæ dz ix, 


and since similar formulas hold it is seen that for 0 < £ < 1/2», 


+00 


(11.17.27) | [ fa @s, a9; £)g (1) A (arg) dor, dev 


— OO 


= | fey; #) gw) dary | f(g; t) h(x) ding 


~-0O 


118 PROBABILITY IN PHYSICAL SCIENCES 


Since g and A% are arbitrary we have for 0 <t < 1/2», 
(IIT.17.28) fa(ay, Za; t) = f(a; t)f (£a; t) 


By a similar, but more tedious, argument we also get 


(T1117 229): flte = Sees TD) 


The restriction on ¢ can now be removed by observing that it does 
not depend on the initial distribution. 

In fact, we can start with some f), 0 < tọ < (2v), and by 
repeating the argument extend the proof of Boltzmann’s property 
to the range 4 S t< tọ + (2v). Proceeding this way we can 
clearly cover the whole time range 0 Si < œ. 

The foregoing proof suffers from the defect that it works only 
if the restriction on time is independent of the initial distribution. 
It is inapplicable to the physically significant case of hard spheres. 
A general proof that Boltzmann’s property propagates in time is 
still lacking. 


18. We now discuss the H-theorem and the question of approach 
to equilibrium. 
Starting with the master equation (III.16.12) 


(I1I.18.1) 4-23 uh {$(A,,(0)R; t) — $(R; t)} d0 


we obtain (see (III.17.3)) 


4) sR 


(IIT.18.2) a 
v 
ee — a | (R; t) do 
n tk 27 (#4 Ms 
and hence 
d 
(III.18.3) Z fak; t)do <0 


Furthermore, if «> 1, we have by Hölder’s inequality 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 119 


(III.18.4) {¢e7R) (0) R) do 
<(/47(R) ens (| $° (4u (0) R) do)” = | $=(R) do 


and hence 


d 
(III.18.5) a fe (R, t)do S 0 
Since 
= ] 
(III.18.6) log ġ = mi 4 F 


we obtain as a corollary of (III.18.5) that 


(III.18.7) = Í $(R, t) log 4(R, t) do < 


The equality in (III.18.3), (III.18.5), and (III.18.7) occurs only if 
(III.18.8) (R, t) = const = 1/S,(4/n) 


where S„(y/n) denotes the surface of the sphere ||R||? = n. 
The one-dimensional contraction of (III.18.8) is readily seen 
to be 


(1 — got a 203) 


V” (1 gad g2) 4-8) do 


—/n 


which in the limit n—>o becomes the Maxwell-Boltzmann density 


(III.18.9) 


(III.18.10) (22)? exp (— 4272) 


(In this connection see Example 1 of Chapter I.) We must now 
prove that 34 


(III.18.11) 4(R, t)>1/S, (s/n) 


34 This is simply the ergodic property of the Markoff process under 
consideration. Rather than to appeal to general theorems we prefer to 
keep the exposition self-contained and provide a proof which, in this case, 
is very simple. 


120 PROBABILITY IN PHYSICAL SCIENCES 


as t-> 00, at least in the weak sense, that is, for every y(R) «e L?(S,) 
(III.18.12) lim | $(R, t) y(R) do = f y (R) do/S,,(4/n) 
t—> 00 


Since the master equation is of the form 


og 
ot 


where 2 is a bounded, self-adjoint, negative operator, we have 


(III.18.13) — Od 


(III.18.14) ($, 7) = [o(R, t) y(R) do 
7 fo eP (At) d,(E(A) (R, 0), z(R)) 


where £ (A) are the projection operators involved in the resolution 
of the identity of the operator Q. 
The function 


(III.18.15) r(2) = (E(a)(R, 0), 7(R)) 


is of bounded variation and since Q is bounded 7(A) is constant for 


sufficiently large negative A. 
Thus 


d 0 
(III.18.16) £ {aR t) y(R) do = fa exp (At) dr (å) 
and consequently 


(III.18.17) lim £ foi R, t) z — 0 


t—> co 


From (III.18.3) it follows that a sequence t,->0o exists such that 
¢(R, t) converges weakly to a function ¢,(R), that is, 


(I1I.18.18) lim | 4(R, ¢,) 4(R) do = f #o(R) x(R) do 


and 


(III.18.19) lim foB; t) 7(A;,;(0)R)do = f ¢o(R) (A ,;(0)R) do 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 121 


Since 
d 
(III.18.20) a jvm t) ¥(R) do 


V 


22 y L f ap | do 80s 1) fel 4 R) — z(R)} 


Nisi<jisn 27 


it follows from (III.18.17) (by letting ¢->oo through the sequence 
t.) that 


01.18.21) 0= 5 eae 


= |do xR) S A £4,(A,(0)R — $o(R)} a 


and since y(R) is arbitrary we have 
l 27 
11.18.22) X = È ffa(4y(6)R) — do(R)} A = 0 
Isi<jsn ANTO 


Multiplying both sides of (III.18.22) by ¢,)(R) and integrating over 
S, we obtain 


(III.18.23) 2 do T TAE )R) — $o(R) \? d6 = 0 
and hence 
(III.18 24) $o(A;;(9)R) = (R) 


for almost every 0 and almost every R. To prove that (III.18.24) 
implies that ¢,(R) = constant almost everywhere, we need the 
fact that the A,,(@) generate a transitive subgroup of the full 
n-dimensional rotation group. This is almost trivial because 
starting with 


(III.18.25) TP (Fc. uf eS, 
we can by an appropriate A,,(9) turn (III.18.25) into 
(III.18.26) (y (4 F é), 0, Eyi e e e) fa) 


122 PROBABILITY IN PHYSICAL SCIENCES 


An appropriate A,, will turn (III.18.26) into 

(III.18.27) (a (E + E + &),0,0,..., En) 

and proceeding this way we see that an appropriate product 
(II1.18.28) Aaaa ess dla) 


will turn (III.18.25) into (y/n, 0, ..., 0) = Ry. Assuming that 
ġ(R,) is defined we see that 


(III.18.23) p(T) = do (R) 


provided, of course, T is such that ¢,(T) is defined and the angles 
6,,...,9,-, (which clearly depend on T) do not belong to the 
exceptional sets of measure 0. It is clear that for almost every T 
the angles 6,,..., 6, will not lie in the exceptional set and hence 


(IIT.18.30) a(R) = const = 1/S,(,/n) 


almost everywhere. 
Since ¢,(R) is unique, it follows that 


(III.18.31) lim #(R, t) = ¢,(R) 
t—>0o 

where the limit is taken in the weak sense. The above analysis 
goes through without any modifications for the general master 
equation (III.16.7) except that the proof that the 4,,(1) generate 
a transitive subgroup of the orthogonal group in 3n — 3 dimensions 
is trickier. 

We have thus shown that the master density (R; t) ap- 
proaches, as too, the equilibrium density 


(III.18.32) $,(R) = 1/S,,(4/n) 


and that the approach is “irreversible” as implied by (III.18.3), 
(III.18.5), or (III.18.7). From the fact that the master density 
approaches (III.18.32) and from the fact that the one-dimensional 
contraction of (III.18.32) is (III.18.9), it follows (again in the 
weak sense) that 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 123 


l — mn! x2 4(n—3) 
(IIT.18.33) lim he (2.1) = oo a a a R 
t—>o0 y (1 — ni g2) anes) dx 
~ (27x)? exp (— 4z?) 
We shall now discuss the relation of Boltzmann’s famous 
H-theorem to our development. 
The H-theorem asserts that 
+00 


d 
(III.18.34) = | _ fle, t) log f(e, t) de < 0 


and is easily derivable from (III.16.13) by following Boltzmann’s 
original derivation. However, in contrast with the statements 
(III.18.3), (I11.18.5), and (III.18.7) (which can be generalized 
further by replacing ¢* or ¢ log 6 by M(¢) provided M is concave 
upward) the functional 


(III.18.35) H(f) = f flog fdx 


is the only one, discovered so far, which exhibits the monotonic 


behavior. 

To elucidate this situation we must recall that (III.16.13) is 
applicable only to distributions having Boltzmann’s property. 
If, in some sense, we could say that 


Hes t) 
(III.18.36) a ae ee e 
IL fe ; 


we would have 


(III.18.37) jersei i) (10g tles: t) t)— logC, )éo 


— tog C, + Z| do log He: f) TI Nes 
1 
where 


(III.18.38) C, = | Ife, t) do 


124 PROBABILITY IN PHYSICAL SCIENCES 


and f(x, t) is the limiting one-dimensional contraction of ¢,(R, £). 
Asymptotically, for large n, we would have 


(x, t) log f(x, t) dx 


(11.18.39) fén log grdo ~ —logC, +n (71 


oO 


and from the fact that 

(III.18.40) Í $n log n do 
decreases in time it would follow that so does 
(III.18.41) H(f) = | flog f dæ 


If the foregoing steps could be made rigorous, we would have a 
thoroughly satisfactory derivation of Boltzmann’s H-theorem. 


19. In Sections 16, 17, and 18 we have shown that a probabilistic 
approach based on the random walk picture described in Part 1 of 
Appendix I by Uhlenbeck is indeed possible and leads to results 
which are in complete agreement with intuition and prior know- 
ledge. 

There remains however the basic problem of justifying the 
master equation on the basis of Liouville’s equation. In other 
words what is needed is the kind of analysis we have performed 
for the ring model in the latter part of Section 15. 

Such an analysis was undertaken by R. Brout as mentioned 
in Appendix I, Part 1. 

Since I consider Brout’s attempt to be highly significant 
(although I am well aware that many points need further elucida- 
tion!), I shall attempt to summarize what I think are the principal 
points of Brout’s approach. 

Let the reader be warned at the outset that we are leaving for 
a while the safe ground of mathematics and are embarking upon 
an excursion into a different realm. Bear with me. The ideas 
are sound even though a thoroughly rigorous execution may prove 
to be very difficult. 

Let us start with some notation. 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 125 


If 
Dig no Vy, Dis -~ Ty; 0) 


denotes an initial distribution in -space (we use now positions r, 
and velocities v, as our variables) then 


3 —t —t —t —t. 
D Vig ee Vy Vise. ay Ey, t) = D(a” Vn Bp pe cent 0) 


where 
Me eseria T ces on 

are velocities and positions of our molecules (monoatomic!) at 
time —? if they were v,,...,Vy,Q,,-.-,Yy at time 0. 

Suppose now that instead of our system we had a system in 
which the intermolecular forces F, = F (|r; — r,|) were to act 
only among a specified group of molecules (1, 3, and 5, say) while 
all the others were to become “‘uncoupled’”’ from each other and 
from the specified ones. 

In other words the potential energy instead of being 


N 
Sb) +E Os) 


would be 
>* ela) + 2U (r;) 


where the asterisk * on the summation sign indicates that both 
T and 7 belong to the specified group. 
We shall denote by 


-7 os; os 
DD ly ot e5 iste oe as ee asa) 


the density at time £ which evolved from 
D(Vi,.. . Vy, ti, .. Py 0) 
if intermolecular forces were “switched on” only for particles 


hir iyys aast 


°} n° 


In particular 


I Oe el) Vara Va ee a) 


126 PROBABILITY IN PHYSICAL SCIENCES 


DA, Bees N == DING |ya ae (Va cl ta cee 10) 


Putting square brackets [ ] around v,* and r;* indicates that the 
jth particle moves only under the influence of the outside force 
— grad U (r,). 

If there are no outside forces we have 
[v7] = v; 


III.19.1 
l ) [r7*] = r; — vyt 


Of course, walls of the container give rise to (singular!) outside 
forces but we shall treat the walls in a “cavalier fashion” by for- 
getting them in the sense of maintaining formulas (III.19.1) but 
remembering them when it comes to integration with respect to 
the r,’s. 

To be sure that we understand the notation note that 


and, e.g., 
DU 2,3, 4,..., N) 
is obtained by finding vI‘, vz‘, v3‘, rī‘, r3‘, r3‘ through solving a 


genuine three-body problem while v;* and r;* (j > 3) will be in 
square brackets. In other words, 


D(1’, 2’, 3’, 4,..., N) 
= DV No i Ve i Vane esr Vio ty ots 6 8p . f— Vil es Ty Vy 0) 
Now set 

Co = D(1,2,...,N) 

CSD 2 ge ete he Daca) SO 


C2 = D(1,2,...,4,...,7',...N) — D(1,2,...,7,...,) 
— D(l,2,...,4,...,7,...N) + D(l, 2,...,N) 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 127 


CD Nee N ose seve 
ho E a aa cece S. 
ay) oe ee, ee re he 
S E ee E re FETE 
+ D(1,2,...,0 00004, 0 00) Bye.) N) 
Se Dl, E ns) pce eae 
AD) E EEN @ adr] PE de pa aN) 
= DN; 25.02) 


etc. It is now a matter of simple counting to verify that 
(LIL IO 2): DG. 43 Vy iow ty 0) SO S sna) 


N 
=O+3C4 X C+ FY ht... 
i=l Ilsi<cisN 1Si<ji<kSN 
This is a nearly trivial identity but it gives a decomposition in 
which the terms have an immediate and highly pertinent inter- 
pretation. 
For instance let us take a look at 


3 
Cis 


Suppose that r}, r2, r3, Vi» Ve, V3 were such that in the time 
interval (—¢, 0) the particles 1, 2, 3 did not interact. 

Remember, we assume that intermolecular forces are short 
vanged and it is possible that during the time interval (—t, 0) the 
particles did not come within each other’s sphere of influence. Or 
suppose that only particles 1 and 2 “felt” each other. 

It is clear that for such initial configurations C, = 0. In 
fact, it is easily seen that 


Cis = 0 


unless all three particles have “felt” each other. 
We now restrict ourselves to the gas of hard spheres and make 
the crucial assumptions that: 


128 PROBABILITY IN PHYSICAL SCIENCES 


(a) DI is 25 oe Vos Te a 0) Sis te Vy) Yaa a Ey) 
TI 8 alr — r) 
(D) Ptaa Te) = E WW Goi onan obey 
oe TE {1 — (ir; —4,])}dr,...dry 
1si<jSN 


(x) = l, O<a<6 (6 = diameter of molecule) 
~ | 0, otherwise 


Assumption (b) means that we start from spatial equilibrium. 
Both assumptions (a) and (b) are made only for t = 0. 
We now wish to calculate: 


b (Vi)... Vy) =. | DUn- 6 Vw T+ Ey; t) dr... drw 


For this purpose we use the “combinatorial”? expansion (III.19.2) 
and observe that 


les Jf Cdr, . . . dry = $(My)- + Vw) = BY + +) Vw 0) 
since Ci = 0, we begin by looking at 


f, {Chats --- 


To simplify writing we set 7 = 1, 7 = 2 and note that 
[,- hee wets dy 
=f... { {DI 2,38, 4,....N) — D(L 2,....N)}an.. dry 
=f. Ti Coa a r a a [rw]) 
— d([vy"], [ve"l,... [vy] v( rr’), Gre")... [ry ])} i dr... dry 
— [,- s IOVI Va s Vg,-- + Vy) Ez Taes Ty) 


— (V1, Vo,--- Vy) y(r, Te,--- Tn) ydr... dry 


where in the last step we have made use of (III.19.1). Remember 
I warned you we were going to treat the walls in a “cavalier 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 129 


fashion” whenever convenient! 
Now, what is the integral 


Pils ts) = i= ; [tn Pos Taos aiy) afgesit 


This is the well-known two-particle distribution of a gas of hard 
spheres in equilibrium. Its exact calculation is still an unsolved 
problem and out of desperation physicists have resorted to an 
expansion (called virial expansion) in powers of concentration. 


C=N/V = 
(v = volume per particle). 
In zeroth approximation (ideal gas) we have 
p(t, ra) ~ 1/V? 


and consequently 
p | Gran, _.. ary 


= al 7 
where B(v,, Və) is a region in six-dimensional space for which 
Ca: 

It should be clear that r; (or r) can be taken arbitrarily within 
V (again the effect of the walls is being neglected), and the condi- 
tion Ci Æ 0 simply means that r, must be such that in time 
interval (—?, 0) (ri, V1) and (r, Va) (at t = 0) yield exactly one 
collision. 

For the hard sphere model, which we have assumed, one gets 


e E ona tly 
= ¥ô?/V | d{d (Ay (1) R; t) — (R, t)} ya = t20 (R; 0) 


where we use now the notation of Section 16. 
We see now that 


R; = f.. 


V 


= ¢(R;0)+?t > 2a dR: 0) +... 


1<si<j< 


{b(Vi', Va") Vas +++) Vw) — E (Yi: Vo» -o Vy) jdr ar, 


[ Diw. -o Vy, Cis -~ ry; t)dr,dr,...dry 


130 PROBABILITY IN PHYSICAL SCIENCES 


and it should be recognized that 
Q= > Q 


1si<jsN 


tj 


is precisely the master operator of formula (III.16.7). 
The fun begins when one looks at 


i= . E T .. ary 


Here, for given v;, V,, V,, one must investigate configurations 
(r; r}, r4) for which all three particles “‘felt’”’ each other during the 
time interval (—?, 0). 

One must first classify various ‘‘collision configurations.” 
This has been done in detail by M. S. Green and taken over by 
Brout. 

One distinguishes: (a) genuine ternary collisions (2, 7, k); 
(b) sequences of binary collisions, e.g., (7,7) (¢, k) which symbolize 
that first i collides with 7 and then t collides with $. (This to be 
distinguished from (t, k) (2,7) in which first 2 collides with k and 
then z with7.) (c) “Virtual collisions,” e.g., (2, 7)’ (£, 8)’ symboliz- 
ing the situation that 7 would have collided with k except that it 
collided with 7 first which prevented it from carrying out its initial 
intent. (d) “Collision cycles,” e.g., (2,7) (t, R) (7, 2). 

One neglects (a) because of high dilution and (d) because one 
argues that there are proportionately few of them for large N. 


The contributions of (b) and (c) are then calculated to yield: 
{2 
2l (Qi Qin T 22 4,255 F 2, Qin T 2,25; ae 2 in, Q ix q O nRa) 


In addition to these terms we get from C* terms the contribution 


t? > (Qis Qe, T Qarn ar Qam) 


1lsi<j<IsN 


Altogether terms of second order in ¢ combine to give 


{2 
— ©2 e 
zr 9R; 0) 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 131 


with the neglect of 


L EAR; 0) 


One then argues that R$, correspond to “recollisions”’ (2, 7) (t, 7) 
which can occur only through an intermediate collision with a wall. 

Again one argues that in the limit Noo, (_V-—oo) their 
neglect is justified. (This point comes up also in the proof of the 
propagation of chaos in Section 17.) 

If all this is not bad enough just think of the higher terms 
Figs Cee 

Apart from purely combinatorial difficulties (in classifying 
various collision configurations) whicb might be handled by intro- 
ducing diagrams akin to those introduced by Feynman in quantum 
electrodynamics, we are faced with a variety of limiting processes 
(N->co, V->oo, N/V = c, c—>0, too, ct remaining fixed) whose 
proper disentanglement is a real challenge! 

You can see why Uhlenbeck in Appendix I, Part 1 called 
Brout’s attempt valiant! Still one begins to see light, and perhaps 
soon the whole story will be completely clear. 


20. The primary disadvantage of the master equation approach, 
at least as far as kinetic theory of gases is concerned, lies in the 
difficulty (if not impossibility!) of extending it to the nonspatially 
uniform case. 

It is thus not clear in what sense the full Boltzmann equation 
(i.e., with streaming terms) is a probabilistic equation. 

One can say that once spatial dependence is taken into account 
the situation becomes overdetermined and there is no room for 
averaging. 

There is no longer an analog of our set S of the ring model and 
the only “lack of specification” is in the initial distribution D. 

Whether this is sufficient for a satisfactory derivation of the 
full Boltzmann equation is not clear. Personally, I doubt it. 

The Bogoliubov approach (presented in detail in Appendix I, 
Part 2) is, at the moment at least, too formal and leaves too many 


132 PROBABILITY IN PHYSICAL SCIENCES 


questions unanswered to be considered in any way definitive. 

One has (at least I have) the feeling that somehow during the 
short time t (duration of a collision), after which higher distribu- 
tions become functionally dependent on the single-particle distri- 
bution, some averaging must have taken place. 

And although for the formal success of the Bogoliubov recipe 
this is of no importance, it is a fundamental problem which badly 
needs clarification. 


21. The remaining sections of this chapter will be devoted to an 
exposition of Smoluchowski’s theory of fluctuations of concentra- 
tion. 

This theory furnishes an excellent example of a statistical 
theory in Physics and will help to further elucidate many points 
discussed elsewhere in this chapter. 

Smoluchowski developed his theory to explain the results of 
experiments of Svedberg and others on colloidal suspensions. He 
later used the same theory and the experimental results on which 
it was based as a basis of a brilliant and penetrating analysis of the 
limits of validity of the second law of thermodynamics. 

Svedberg observed in regular time intervals (39 observations 
per minute) the number of colloidal particles inside a fixed region 
A inside a large vessel. 

Here is the beginning of a sequence of 517 numbers taken from 
Svedberg’s observations: 


l, 2, 0, O, 0, 2, 0, 0, 3, 2, 4, l, 2, 3, l, 0, 2, 
1, 1, 1, 1, 3, 1, 1, 2, 5, 1, 1, 1, 0, 2, 3, 3, 1 


How does one analyze such data, and what is to be learned 
from such an analysis? 

This is the kind of problem with which a natural scientist must 
often cope, and we shall now try to show how one goes about it. 

Suppose that the container has volume V and that it contains 
N colloidal particles. Now two assumptions are made: (1) each 
particle performs Brownian motion; (2) the particles are independ- 
ent of each other. 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 133 


The first assumption is not essential for the mathematical 
theory. All that is required is a sufficiently detailed statistical 
description of the motion of a stngle particle. 

By this we mean that the position r(r) of a representative 
particle is a well-defined vector-valued stochastic process. 

Lest the reader has skipped or has by now forgotten Section 6 
of this chapter, let us remind him that all this means that r(r) is 
a one-parameter family r(t; w) of measurable functions defined 
on a set Q in which a completely additive measure u (u(Q) = 1) 
has been introduced such that for three-dimensional Borel sets 
A,, Ag,..., A, we have 
Prob {r (ti) € Ay, r(t) € A,,..., T(n) € A,} 
= ufr (i; w) e Ay, r(t; w) e Ag,..., T(n; 0) A,} 3 

Assumption (2) is of crucial importance. It represents an 
enormous simplification of the problem, since it allows us to ignore 
such important phenomena as coagulation; it can be justified only 
for highly dilute colloidal suspensions. 

When formalized assumption (2) means the following. Let 
r,(t; w,), œ; e 2,, be the stochastic process describing the motion 
of the jth particle. Assuming the particles to be indtstinguishable, 
one can think of Q1, 2,,..., Qy as being idential (Q, = Q). 

The statistical description of the joint motion of all particles 
must of necessity take place in the product set 

LP LX aXQ 

To assume independence is to impose the product measure on 
O Kars U, 

Several other assumptions of a more specialized nature will be 
made when the need for them has become apparent. 


(III.21.1) 


35 A physicist who seldom feels the need for formalizing the ‘“‘obvious”’ 
thinks of a stochastic process as being well defined if he knows how to 
calculate the probabilities on the left-hand side of (III.21.1). A mathe- 
matician spends most of his time worrying whether these and more in- 
volved probabilities can be unambiguously defined. There are situations 
(see, e.g., Sections 1-3 of Chapter IV) in which such worries are well 
justified, but in this and in succeeding sections they are of secondary 
importance. 


134. PROBABILITY IN PHYSICAL SCIENCES 


22. Let y(r) be the characteristic function of the region A, i.e., 


(IIT.22.1) je) la a 
Then 

N 
(III.22.2) nalt) = > v(r,(6)) 


is simply the number of particles in A at time t. 
Clearly n,(¢) is a stochastic process, and a more consistent way 
of writing would be: 


N 
(III.22.3) nalt) = nalt; @) = > y(r;(t; @,;)) 

j=l 
where œ is an abbreviation for the v-tuple (w,, @,,.-.,@y) and 


wje Q. (In other words @€2 X QX... X Q). 
What is needed now is the calculation of probabilities 


(III.22.4) Prob {n4(4) = m, nalta) = na, . . ., Na(h,) = Ngy 
Let us show in detail how to calculate 
(IT.22.5) Prob {n4(t) = m, nalta) = n} 


and it will become clear how to extend the calculation to obtain 
the more general probabilities (III.22.4). We start with the ob- 
vious formula 


(I11.22.6) (2n)-2[" f" exp [i(ëk + ml)] dédn = 6,,051,¢ 


where k and / are integers and 6 the familiar Kronecker symbol. 
It follows immediately that 


Prob {n(t,) = m, n(t,) = n} 
an 
=E|(2 me)? | k exp| —7(Em-+nn) |exp[t(En4 (4) +n, (t 2))14é dn! 
Here the symbol E indicates integration over the product space 


QX... X Q. Interchanging the order of integration (Fubini!) 
we get 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 135 


(III.22.7) Prob {n(¢,) = m, n(t,) = n} 
= (ae [ eA ea (Ena (t,) +nna (lt) )]}dE dn 
It remains to calculate 
E{exp [1(&4(4) + ynalt))]} 
which is simply the integral over Q x... X Q of 
exp [2(En4(4; @) + qnali; @))] 


with respect to the product measure postulated in Q xX... X Q. 

Using (III.22.2) and the assumption of independence of 
particles (i.e., that the measure in Q X ... X Q is the product 
measure) we get 


[exp |i(féna(h) T nnalta)) || 
on Ble +(e) 
il E [exp i i (Ep(r,( ) + ny(r,(t 2))) || 


= [z [exp [i AN T ny(r (t))) | I} 


The second equality is a direct consequence of independence 
while the third is implied by the assumption that the particles are 
indistinguishable (hence statistically identical). 

Now, since y assumes only values 0 and 1 we have 


exp i (y(r (t) + ny(T (te )))] 
= 1 + [exp (t) — 1] p(r(4)) + [exp (în) — 1] y(r(4)) 
+ [exp (i£) — 1][exp (in) — 1] y(r(G)) v(r&)) 


(III.22.8) 


| 


and hence 

(III.22.9) E [exp li seins en n(r(t))) || 

= 1+ [exp (1£)—1] Prob {r (t) « 4}+ [exp W D. Prob {r (t)c A} 
+ [exp ()—1][exp (iņn)—1] Prob {r(t € A, r(t,) « A} 


136 PROBABILITY IN PHYSICAL SCIENCES 


Combining (III.22.7), (III.22.8), and (III.22.9) we obtain 
(III.22.10) Prob {n,(¢,) = m, nalt) = n} 
2m (2n p 
= (22) | T exp[— t(ġm + yn) ][1 
+ [exp (t£) — 1] Prob {r(t,) « A} 
+ [exp (tn) — 1] Prob {r(t,) « A} 
+ [exp (i£) — l][exp (tn) — 1] Prob {r (t) € A, T(t) € A}]% dé dy 
This formula is too complicated to be of much use, and we 
must look for certain features which will help simplify matters. 
The first source of simplification hes in the fact that both 


N and V are large. In fact, one is only interested in formula 
(III.22.10) in the limit 


(III.22.11) N>œ, V>o, NV >» 


(The meaning of v is obvious; it is the average number of particles 
per unit volume.) 

The next source of simplification les in the assumption of 
“statistical equilibrium.” 

This assumption is really twofold and the two parts are: 


(a) Prob {r(4,)eA,,..., r(é,) € An 
depends only on time differences t, — t, (stationarity). 
(b) Prob {r(#)¢ A} = |A|/V 


where |A| is the volume of A. 
Finally a third assumption is needed which at first glance 
looks stranger than it really is. This assumption is as follows: 


(c) Prob {r(¢,)¢«A,,...,r(t,)«A,} 
=|, a Wpis is Pop te oe Eyst) OT <2 «OE, 
where 


Wilt Fis oun 6 Pal) SV AP feats | Poe ly ee S Py ts) 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 137 


and 


im Peteli ia bo wes Bob Se Pg Fl Bote ceo) ah) 


V—oo 


exists in the weak sense. 

It should be understood that V—>œ means that not only the 
volume of the container but all of its “dimensions”? approach oo. 

The meaning of assumption (c) is best explained by reference 
to Brownian motion. 

If a Brownian particle moves in a container with reflecting 
walls it can be shown that 


Welty ty. . 3 lar Én) 
= V Py (ty h | ra te) Py (£a, be | Pgs t3) -o Py (Eni tra | Pn bn) 
where 
Py(r, t | @, ta) 


is the fundamental solution of the diffusion equation 


oP 
(III.22.12) — = AP 
ot 
subject to the boundary condition 
oP r 
on 


on the boundary of the container and satisfying the initial con- 
dition 

lim Py(r | g; t) = 6(@ — r) 

t—0 


(units have been chosen to make the diffusion constant 4). 
For a finite container P,(r | ọ; t) depends on both size and 
shape of the container. But again it can be shown that 


lim Pp(r, & | Q, ta) 


V—oo 
= [2n(t, — t,)}-# exp [4 le — r|? (f — 4)74] 
= P(r, 0| Q, t, — 4) 


138 PROBABILITY IN PHYSICAL SCIENCES 


which is the fundamental solution of (III.22.12) for the whole 
space. 

It is curious to note that although we work so hard to get rid 
of V (by letting V->co) we could not in a natural way (at least I 
don’t know how) start with the whole space and eliminate the 
cumbersome limiting process. 

Now everything is easy! 

Taking the limit (III.22.11) and making use of all our assump- 
tions we get 


(III.22.13) W(m,n;t, — t) = W (m, t; n, ta) 

= lim Prob {n4 (4) = m, nalta) = n} 

= (2x) |" [f exp [— ilm + nn)] F(E, n) dë, dy 
where 


(III.22.14) F(E, 7) = exp [u{[exp (t£) — 1] + [exp (ty — 1)] 
+ g(t — %) [exp (t£) — 1] [exp (în) — 1]}] 
(IIT.22.15) u = r|A| 


(III.22.16) 2) = am f P(r, 0 | ọ, t) dr do 
E [Al JaJa ' 


The meanings of u and g(t) are clear. 

u is the average number of particles in A and g(t) is the 
conditional probability (in the limit III.22.11) that a particle start 
ing at t = 0 in A will be in A at time t. 

Smoluchowski worked with the quantity 


(III.22.17) P(t) = 1 — g(t) 


which he called “probability after-effect” (““Wahrscheinlichkeits- 
nachwirkung’’). 
He obtained the formulas 


W (m,0;m + k, t) = exp (— w)u™/m! > AE ags k=O 
i=0 


W (m, 0; m — k, t) = exp (—p)u™/m! SAME, n, OSkSm 


i=k 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 139 


where 
Am) — mo: Pal __ Pye 
and 


(uP)? 
i! 


E, = exp (— uP) 


It is an exercise (a little tedious) to derive these formulas 
from (III.22.13); Smoluchowski’s derivation (purely combina- 
torial) is actually simpler than ours. 

However, our derivation can be immediately extended to the 
calculation of probabilities (III.22.4) while Smoluchowski’s 
method would lead to complicated combinatorics. 

In spite of the great deal of talking we have done to arrive at 
the formula (III.22.13) and in spite of its rather complex form, the 
purely mathematical content of this and the preceding section is 
relatively meager. 

An “expert” in probability might easily dismiss the whole 
thing as much ado over nothing or worse yet hurl at it the ultimate 
and presumably most devastating adjective “‘trivial.”’ 

But now, let us take a look at the final formulas from the point 
of view of a physicist. 

From Svedberg’s data he would compute the frequency 
{(0, 0; ¢), say, with which 0 is followed by 0 after ¢ sec. (¢ will have 
to be a multiple of 1/39 since Svedberg took 39 counts per minute); 
he would then equate the observed frequency with the theoretically 
calculated probability W (0, 0; ¢) and obtain a numerical value for 
g(t). Since in Svedberg’s experiments the particles were essentially 
free Brownian particles he would set 


Al a jier — }||r — ol|?/Dt) (4zDt)- dr do 


(we now put back the actual diffusion constant D!) and calculate 
D. Now D for spherical particles happens to be given by the 
formula 


D = kT/(6nan) = RT/(6Naan) 


140 PROBABILITY IN PHYSICAL SCIENCES 


where R is the universal gas constant, N the Avogadro number, a 
the radius of the colloidal particle, and 7 the viscosity coefficient 
of the liquid in which the colloidal particles are suspended. 

Thus he can find the Avogadro number N to be about 
6.09 x 103. 

To deduce a number of the order of 1073 from Svedberg’s 
numbers none of which exceeded 6 is downright miraculous! 

Here then is again a result whose worth cannot be judged on 
its mathematical merits alone. 


23. From the formula for W (m, ti; n, tẹ) one can immediately 
deduce the formula for 


W (m, t) = lim Prob {n,4(4,) = m} 


V—oo 


In fact, 


(III.23.1) W (m, t) = > W (m, h; n, ta) 


= (27)! A exp (— iém) F (é, 0) dé 


= exp (— p)u™/m! 
This formula, of course, can also be derived directly. More 
interesting are formulas for 
W (mistis Nos bods» s Mpate] 


= lim Prob inah) = np nal) = na .. +, naill) = me} 


V—>oco 
The extension of the method of Section 22 is immediate and 
the result is the following: 


(III.23.2) W (n, b; na bi... Mes te) 


27 27 , k 
= (2x)-*exp(—p) |"... [ exp (—i È E) F (Ex +- Gx) db. Ub 
where 


(III.23.3) F(g,...,&) = exp E Ir [1 + g;(exp (285) — 1)] | 


j=1 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 141 


and the star on the product indicates that it must be interpreted 
in the following sense. Perform the multiplications as if the g’s 
were numbers and then replace 


§1,81,° EL 
by 
(I11.23.4) i | ta ore ) 
=) hes 7 | r2, t awe as ty) Ag eei, 36 


(if s = 1, g, is simply replaced by 1). For instance, 


gp o + g(exp (26;) — 1)] 


= l + (exp (t) — 1) + (exp (ta) — 1) + (exp (#63) — 1) 
a e | ta) (exp (t81) — 1) (exp (76) — 1) 
g(t, | fs) (exp (25,) — 1) (exp (ts) — 1) 
ees 3) (exp (ta) — 1) (exp (t) — 1) 
8 (4 | 42, #3) (exp (26,) — 1) (exp (té) — 1) (exp (263) — 1) 
Suppose now that there is a process r(¢) 3? whose transition 


probabilities are P(r}, t | Po; toy « «2, Ens Én). 
In other words a process such that 


(III.23.5) Prob {r(4) = r, | r(t,.)€ Ag... r(t) € Án} 


= ae dg Pe b | To ta; . i En, bp) ah... dt, 
(Three-dimensional free Brownian motion is such a process; 
for more details see Chapter IV.) 


86 It should be understood that the times ģ, . . ., ¢, as well as #,,,..., ty, 
are ordered. 

37 We use the same notation to denote a different process. For the 
sake of consistency of notation we should have used in preceding sections 
the notation ry(?). 


142 PROBABILITY IN PHYSICAL SCIENCES 


With such a process we can associate the process n4(t) %8 
which can assume only nonnegative integral values and which is 
defined by the following assignment of probabilities: 


(III.23.6) Probing hi) = 1, ..., nal) = m} 
== W (Nis ts Noyes a a Mpal) 


(the W’s being given by the formulas (ITI.23.2), (III.23.3)). 

Is the process n4(t) well defined? It is almost immediate to 
show that the W’s satisfy the required consistency and continuity 
conditions. All one needs is that 


(IIT.23.7) lim g(t) = 1 
t—0 

But are the W’s nonnegative? If the P’s can be obtained as 
limiting values of P,’s (see condition (c) of Section 22 of this 
chapter) the nonnegativity of W’s is trivial, since they are limits of 
nonnegative quantities. But can we be sure if the P’s are simply 
transition probabilities of a process r(t)? 

There ought to be a direct proof of nonnegativity from for- 
mulas (III.23.2), (III.23.3) but the job looks messy and tedious. 

Is it true that the P’s can always be obtained as limits of 
P,’s for an appropriately defined process ry (t)? 

Almost certainly, but I don’t have such a proof. 

What should we do then? 

Well, using the time honored device we'll simply restrict our- 
selves to processes r(t) (like free Brownian motion) for which 
ry(t) can be defined. 

Having the consistency and nonnegativity we invoke Kolmo- 
goroff’s theorem (Section 6 of this chapter) and at last we are 
“legal.” 

The processes ,(t) so defined will be called, in the honor of 
their true inventor, the Smoluchowski processes. 


88 Again we use the same notation for a different object. Formerly we 
should have used ny y/(é). 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 143 


24. The Smoluchowski process based on Brownian motion of free 
particles is the simplest realistic statistical model on which the 
questions of reversibility and recurrence can be discussed in some 
detail. It is true that the Ehrenfest ‘‘dog-flea’’ model which we 
have discussed elsewhere in this chapter in great detail is simpler. 
But ingenious and useful as it is, it is also entirely artificial. The 
Smoluchowski model is firmly rooted in Physics and conclusions 
drawn from it can be checked against a substantial body of ex- 
perimental data. We shall discuss the problem of recurrence in 
forthcoming sections. 

Here we shall make a few remarks to round out the physical 
background. 

First let us define the conditional probabilities 


P (n,, til Ne, ty, by} ses] Maes te) 


by the usual formula 


(III.24.1) P(n, ty | May tz; - - -5 Mus ty) 
So W (n, ti; Mos tz; » 5 Nps tee) 
W (m, t) 
= =a Wy ts Norte 0 Mysty) 
W (n) 


and observe that 
(III.24.2) P (ny, ty | Mar taraa sr Mas by) 
= P(n, 0 | na, ta — 43. . .3 Me, & — 4) 


Let us now calculate 


(ITI.24.3) E{na(0) = m| nat) = > nP(m, 0| n, t) 
n=0 
or the mean number of particles in A if at ¢ = 0 there were m 


particles in A. 
We have 


144 PROBABILITY IN PHYSICAL SCIENCES 


> nP(m, 0{| 0, t) 
—0 A so 
= Wom 2 nW (m, 0; n, t) 
=g Èe j, |, exp [ilem + mm) FO, 0) ab, dn 
= S 9)-1 Í : J 
= Wim) 2 0 (2n) , exp (= mm) Hn) an 
where 
f(n) = (x) | exp (— itm) Fn) a8 
Since 
fla) = X ny | exp (— itm) ME) dz exp (inn) 
we have 
f (n) =i E (2a) f exp (— in) jE) dè exp (inn) 
and hence 
1 l 
E{na(0) = m|ng(t)} = W (m) Ff (0) 
l a OF 
= ——— (2x)! | exp (—iém)(—) dé 
(Iit.e44) 7) | (a), 


een (2x) | “exp (— im) exp (u[(2é) — 1]) 
- {1 — g(t) + g(t) exp (2&)} dé 
= mg(t) + w(1 — g(t)) = u + (m — u)g(t) 


This formula is in complete agreement with the macroscopic 
theory of diffusion if one recalls that 


e0 =a J, [exp (= Elle — ell? Dt) xD) dr do 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 145 


In fact, u is simply the normal (and ultimate since g(t)—>0 
as too) amount of “stuff” in A while (m — u)g(t) is what is still 
left in A at time ¢ of the initial “extra stuff”? (which may be nega- 
tive!) (m — u) under the classical laws of diffusion. 

Similarly we find that 


E {n4(0) = m | (na(t) — u — (m — n)g(t))*} 
= mg(t) (1 — g(t) + u(1 — g(t) 
Thus the relative fluctuation is 


[mg (t) (1 — g(t)) + a(1 — g))]* [u + (m — ag> 
and for large u and m ~ u it is of the order w~? and hence negli- 
gible. Thus one can dispense with probability and use the phenom- 
enological diffusion theory. 
For small values of u (in Svedberg’s experiment referred to 
above u = 1.54) the fluctuations overshadow the mean, and we 
are in a realm where statistical treatment is obligatory. 


(III.24.5) 


25. For the Ehrenfest ‘‘dog-flea’’ model we had 
(TIL25.1) F (nolona «coy Mi) =P (nln P il Me) eae Fg |) 


This is the “Markoff property”? which makes the model easy 
to handle mathematically. 

Is the Smoluchowski model Markoffian? In other words is 
it true that 


PT Nl ea Nes ly) 


(IIT.25.2) 
= P(n, 4 | Me, te)... P (Mya, br | nw tr) 


In general the answer is “no”, and one must resort to an arti- 
ficial example before one gets a Markoffian model. 
To see this, note that (III.25.2) implies 


P(m,0|n,t) = > P(m, 0|k, t; 0, t) 
x=0 


=> P(m,0|k,t)P(k,0|n,t— r) 


k=0 


146 PROBABILITY IN PHYSICAL SCIENCES 


or, in other words, the matrix 


TI €) = ((P(m, 0 | n, t) 


satisfies the equation 


(IIT.25.3) UOILL 7) ® 
Let us see whether at least (III.25.3) holds for our processes. 
Writing 


exp [uf (exp (i£) —1) + (exp (in)—1) + (exp (i£) —1) (exp (in)—1)}] 
= exp [u(exp (i£) — 1)] exp [u(exp (tn) — 1)] 


5 EO (exp (i) — 1) exp (in) — 1)" 


we see that 
W (m, 0; n, t = 
(III.25.4) P(m,0|n,t) = ey =y 
W (m) k=0 


where 
(T11.25.5) Enr = exp (u)m! u™ (2x)! |f" exp (— iêm) 


‘exp [u(exp (i£) — 1)] (exp (i£) — 1)" dé 
and 
l . 
(111.25.6) Yin = 7 (2m f exp (— ayn) exp [u(exp (in) — 1)] 
‘(exp (în) — 1)* dy 
Since 
P(m, 0 |n, 0) = Onn 


(this is a consequence of g(0) = 1) we have 


k=0 


3 This equation (the so-called Chapman-Kolmogoroff-Smoluchowski 
equation) is often erroneously taken as the definition of a Markoff process. 
There is an example due to P. Lévy for which (III.25.3) holds while 
(III.25.2) does not. 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 147 


and hence (III.25.4) represents a dragonalization*® of the matrix 
IT ©). 
In particular, the numbers 
1, g(t), 8° (6), g). 
are the eigenvalues of [|] (). 
It is now clear that for (III.25.3) to hold it is necessary that 


(III.25.8) g(t) = g(t) g(t — t) 
and since g is measurable it must be of the form 
(II1.25.9) g(t) = exp (— at), a > 0 


Since for experiments with colloidal particles g(t) is never of 
the exponential form (III.25.9) the process cannot be Markoffian. 

The question now arises whether processes for which g(ċ) 
is of the exponential form (III.25.9) are Markoffian. 

It is easily proved (though the underlying calculations are too 
tedious to be reproduced here) that if in addition to (III.25.9) we 
also have (for 4 < ta <... < i) 


(III.25.10) g(t | ta tz.. tr) = g(h | i) = Blt — h) 


then (IIT.25.2) holds and the process is indeed Markoffian. What 
remains unresolved is the following exasperating problem. 
Does (III.25.9) imply (III.25.10)? The answer seems almost 
surely “no” because why should the form of g(t | #,) determine the 
higher “after-effects?” 
Still the problem is open. If one could find a process for 
which 


g(t) = exp (— at) 


40 Actually (III.25.7) is not sufficient to show that (III.25.4) furnishes 
a diagonalization of [I(t). One needs the relation 


(II1.25.7a) > YimUmk = Ôi, k 

m=0 
which for infinite matrices does not follow from (IIJ1.25.7) and, in fact, 
need not even be true. Fortunately, in our case (III.25.7a) can be estab- 
lished be a separate calculation which is not, however, particularly simple. 
Perhaps it will amuse the reader to find a proof. 


148 PROBABILITY IN PHYSICAL SCIENCES 


but for which (III.25.10) would not hold we would have another 
example (less artificial than the example of P. Lévy referred to 
above) of a non-Markoffian process for which (III.25.3) holds. 


26. Are there processes satisfying (III.25.10)? 

Clearly (III.25.10) can be satisfied only if the motion of each 
individual particle is such that being in A at times 4 and tł, implies 
being in A at all intermediate times. 

Processes satisfying (III.25.10) will be called persistent. 

The simplest persistent process was first considered by Fürth 
who observed the number 7,(¢) of pedestrians in an interval A 
marked off on a sidewalk. 

Pedestrians are assumed to move with the same speed |v} in 
either of the two directions without ever changing their direction of 
motion except at two artificial reflecting barriers placed at —L 
and +L (we'll soon let Loo). The interval A can be taken to 
be (—#J, 42). 

Let ¢,(x) be defined as follows 


fLl) = x — pL, jz — 2þL| < L, p even 
$L(x£) = — x + 2L, |æ —26L| < L, p odd 
We can set 
(I11.26.1) r; L(t) = frl; + elvolt) 
where £i, .. . Zy, &, €,...,€y are all independent each x being 


uniformly distributed in (—L,L) and each € assuming values 
+1 and —1 with equal probabilities (4). 

In purely mathematical terms this means that our set Q is the 
product set 


(III.26.2) C20) al, 1 


the measure in (—L,L) being the ordinary Lebesque measure 
(normalized to 1) and the measure on {—1, 1} assigns weights 4 
and 4to the alternatives. The measure on (III.26.2) is the product 
measure (this is the mathematical translation of the assumption 
that the starting point and the direction of motion are independent). 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 149 


It is almost immediate that as Lo we obtain 


cies, ghee © Oe E 
(III.26. gt) = | 9 Pate 


The process is clearly persistent. 
If we generalize the process by setting 


r(t) = r(x; + vt) 


where v,’s are assumed to have the same even density function 
f(v) 41 we obtain again a persistent process for which g(t) (again 
in the limit Loo) is given by the formula: 


g(t) = 2 | max (1 — [1 vt, 0) f(v) dv 
(III.26.4) z 
2f (1 — vt) f(v) dv 


This shows that not every g(t) can serve to define a persistent 
process of the foregoing type. 

Clearly g(t) must be such that if we solve for f(v) the solution 
is nonnegative. 

Assuming g to be twice differentiable for ¢ > 0, we obtain by 
differentiating (III.26.4) 


2J-1 H of(v) dv = — g' (t) 


and differentiating once again 

2t-2 1-1 f(li-1) =g (0) 
Thus if g’’(t) = 0 we get the solution 
(III.26.5) f(v) = jog" (Lv) 


41 The set 2 is now (—L,L) x (— œ, œ), where the measure on 
(— oo, oc) is defined by 


u(E) = | flv) d. 


All assumptions of independence are, of course, preserved. 


150 PROBABILITY IN PHYSICAL SCIENCES 


but in order that 


we must further assume that 


lim tg’(¢) = 0 


t—>oo, t—0 


Clearly 
g(t) = exp (— at) 


satisfies the conditions and hence a persistent process can be based 
on this function. Thus a Markoffian process n,(¢) can be con- 
structed but its underlying velocity distribution 


f(v) = 3? 0? |v| exp (— lalu) 


is exceedingly artificial. 

The reader must have noticed that the persistence of processes 
considered in this section is due to the fact that A (which in our 
case is an interval) is connected. 

Were A a union of two or more intervals we could not main- 
tain that 


(1II.26.6) g(t, | ta t3) = g(t, | tg) 


Since from Svedberg-like data one could decide whether 
(III.26.6) is true, we have here the amusing possibility of deter- 
mining the connectivity of a set (in one dimension) by a statistical 
analysis of counts. 

One final remark to round off this section. 

With the process ,(t) we can associate the process e(t) as 


follows: 
1 if t) = 0 
(III.26.7) ya 
0 if nat) 40 


Clearly the process (III.26.7) is stationary and 
(III.26.8) Ef{e(t)} = Prob {n4 (t) = 0} = exp (— u) 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 15l 


Furthermore 


EXe(t,)e(t,)} = Prob {n4(4) = 0, nalt) = 0} 
= W (0, t; 0, ta) 


PALA os (2n)-2 f" [f exp [u{ (exp (i£) — 1) + (exp (in) — 1) 
+ g(t, — 4) (exp (i) — 1) (exp (in) — 1)}] dé dy 
= exp (— mu) exp [— u(1 —g (t — 4,))] 
Now let 
u = log 2 
and 
é(t) = 2e(t) — 1 
We have 
E {é(t)} = 0 
and 


E {@(t,) e(t,)} = 20e-) — 1 
It thus follows that every function of the form 
99(t) __ l 
can be a covariance function of a stationary process assuming 


values -++ 1 and —1 provided one can base a Smoluchowski process 


on g(t). 
In particular, if 


and 
lim żg'(t) = 0 


too, t—>0 


then 
29 — ] 


can be a covariance of a +1 stationary process. 


152 PROBABILITY IN PHYSICAL SCIENCES 


27. Formulas (III.23.2) and (II1.23.3) although quite explicit are 
of such complexity that it seems almost impossible to use them for 
a general discussion of life times and recurrence times of states. 

However the state 0 (no particles in A) can be handled in 
complete detail and this is already of interest. 

We first study the problem of the life time (also called per- 
sistence time) of the state 0. 

The problem is to calculate 


(III.27.1) P(0, 0| 0, At; 0, 24t; . . .; 0, ndt) 
especially in the limit 
(ITI.27.2) At->0, nAt = t 


This limit can be defined as the probability that the life time 
of the state 0 is greater than £. 
It follows from our basic formula (III.23.2) that 


P(0, 0 | 0, At; 0, 24t; . . .; 0, nAt) 
27 27 
= (2a) f e O E 


It must now be observed that 


I (1 + g,(exp (7é,) — 1)] = II" [(1 — g) + g; exp (#é,)] 


j=0 
= (1 — g,) 11" [(1 — g,) + g exp (28;)] 
n—1 
+ exp (1&,) 8n I} [(1 -- g;) + g exp (2&,)| 
where both 


—1 


(1 > En) iT" and En TI 


j=0 


are to be interpreted symbolically all the way. 
It now follows that 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 153 


27 = 


Flegar al ue, 
n—1 


= exp [u (1 — g,)] I} [(1 — g;) + g; exp(2é,)] 


(200) + | 


0 


and by repeated applications of this procedure we obtain 
(II1.27.3) P(0,0| 0, At;...;0, ndt) = exp E IT* @—g,) | 
j=0 


The symbolic product 
LI (1 — g,) 


can now be evaluated by our rules and one obtains (using the 
elementary inclusion-exclusion principle of combinatorial analysis) 


(1.27.4) TT*(1—g,) 


= "S Prob{r(kAt)e A Ir((k+1)At) ¢A,..., r(nAt) gA} 


k=0 


This formula becomes particularly simple for persistent pro- 
cesses since then the summand is simply 


1 — g (At) 
Thus for persistent processes 
(III.27.4) P(0,0| 0, At;...; 0, ndt) = exp [— nu(1 — g(4t))] 


and it is seen that the answer is the same as tif the process were 
Markoffian. This is clearly a peculiarity of state 0. 
In the limit At—0, nAt = t we get 


exp [ug (0)t] 
provided g’(0) exists. 
28. Much more interesting is the result for the recurrence time. 
Recurrence time of 0 is the life time (persistence time) of 


the lumped state “not zero” (0) and herein lies the greater difficulty 
of the problem. 


154 PROBABILITY IN PHYSICAL SCIENCES 


We must now calculate 


P(0, 0 | 0, At; 0, 24t; . . .; 0, nAt) 


IIT.28.1 
( ) = > P (0, 0 | ki, At; ka, 2At; . . .; kps NAE) ”? 


The sum in (III.28.1) is equal to 


SO (2a) n We _ pr exp [— i(k, g, +... + kag)] 
ky #0,..., ky #0 j * 
Sa A N A E A 


and we note that 


Y (22) ff" exp [— thy En] F (Eos Ers «+ o En) dën 


kn #0 
= F (Eqs b,y- +s Exar 0) — (20) [0 FG rs End dey 
n—1 
= exp [u T]* ((1 — gs) + 8; exp (#8;))] 


—1 


— exp [u (1 — gn) I ((1 — g;) + g exp (78,))] 
Continuing in this way we get 


2m (2m : 
kn, #0, k h i; “a ia U(RnaEna + Rn Eal] 
— a eee Én dé, Gt, 


n—2 n—2 
= exp (TT — exp [u(1 — gn, ) TE*] — exp [a (le oi) nie 


—2 


+ exp [u(1 — g,-4)(1 — 8n) ti 
and finally (observing that integration with respect to & is some- 


42 In the limit 4t > 0, ndt = t this probability will go to 0. This is 
the same difficulty we have encountered in Section 6 of this chapter. To 
get a sensible limit we must divide 

P(0, 0 | 0, At; 0, 24t; .. .; 0, ndt) 


by P(0,0| 0, dt) and then pass to the limit. This is in accordance with 
Smoluchowski’s definition of the mean recurrence time. 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 155 


what different from other integrations) 


P(0, 0| 0, At;... 3 0, ndt) 
= exp[j(1 — go)] — È exp [a(1 — 0) (1 — g) 
mee) L exp [u — go)(1 g) — ga) 
- &_exp[a(t—es) (1g) (0 —2,)(1— es) + 


with the obvious understanding that all exponents are to be taken 
symbolically. 
Using formulas analogous to (III.27.4) we obtain 


P(0, 0| 0, At; . . .; 0, nAt) 
= ] — > exp [— u Prob {r(0) e A | r(zAt) ¢ A}! 
i=1 


+ > exp [—wu(Prob {r(0) eA | r(tAt) ¢ A, r(jAt) ¢ A} 


1si<jsn 


+ Prob {r(iAt) e A | r(jAt) ¢ A})] —... 


Again considerable simplification is achieved for persistent 
processes. 
Setting 


(III.28.3) hAlt) =1— gt) 8 
we get (for persistent processes only) 


P, = P(0,0|0, At;... ; 0, nAt) 
= 1— > exp [— wh(At 
(III.28.4) ea) 


+ 2 exp [— wh(tAt)] exp [— ulh] — 1)At)j—... 


lsi<jsn 


43 A(t) is exactly Smoluchowski’s P(t) (see ITI.22.17). The reason for 
changing notation is that by now the letter P must be tired from so much 
use. 


156 PROBABILITY IN PHYSICAL SCIENCES 


Consider now the generating function 
(III.28.5) H(z) = > exp [— ph(tAt)] 2* 
i=l 
and note that 


$ exp [— uh(i4t)] 


i=1 


is the coefficient of z” in the expansion of 
(1 — 2) H(z) 
> exp [— uh(idt)] exp [~ w(h(j — i)At)] 


lsi<jsn 
the coefficient of z” in the expansion of 
(1 — 2) H(z) 
etc. 
It thus follows that 


> Paz” = (1 —2z)z — (1 — 2) H (2) + (1 — 2) H? (z2) —... 


n 


or 
(II1.28.6) (l—2) SP a ee 
. . aa A ae = — — Zz ee 
n=1 l +- H(z) 
Rewriting (III.28.6) in the equivalent form 
l — œ (P, — P l 
(III.28.7) z+ Ga = Pet) ea 
Ti kel Fi P,(1 + H(2)) 


and setting 


we see that 


w Pe = 
> ORT EE ktl exp (— san | exp (— st) doy, (t) 
k=1 P, 0 

where 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 157 


Assuming g’(0) to exist we have 

P, = P(0, 0| 0, At) = 1 — P(0, 0| 0, AZ) 
= 1— exp [— „(1 — g(4t))] = 1 — exp [— ph(At)]~ wh’ (0) At 
= aAt(a = uh’ (0)) 


and hence from (III.28.7) 


1 + sa! — lim K exp (— st) do,,(t) 


At—>0 
= =a” exp (— st) exp ( (‘)) at | 


It now follows that 


exists and that 


K exp (— st) do (t) 


(III.28.8) 
= l + sat — [a fi exp (— st) exp ( Hal? 


The interpretation of o(¢) is immediate: it is simply the distri- 
bution of the time interval between the “last” instant at which 0 
is observed and the “first” instant at which it is observed again. 

Although it seems impossible to invert (III.28.8) so as to find 
a(t) explicitly, several interesting conclusions can be drawn from it. 

First, letting s—0 and observing that 


f exp (— uh(t)) dt = œ 
one gets 
(III.28.9) [> dot) =1 


This is simply Poincaré’s recurrence theorem. 
Next, using the assumption g(t)—>0 as too (hence /(/)—>1, 
too) and the decomposition 


158 PROBABILITY IN PHYSICAL SCIENCES 


{> exp (— st) exp (— uh(t)) dt 
= stexp(—m) + |” exp (— st) [exp (— uh(t)) — exp (—u)] dt 
= s exp (— u) + U(s) 


we obtain by differentiating (III.28.8) with respect to s and then 
letting s—>0 


(III.28.9) 


(III.28.10) [E tdo(t) = a-*(exp u — 1) 


which is Smoluchowski’s formula for the mean recurrence time if 
observations are taken continuously. 

Higher moments can also be calculated but calculations be- 
come progressively more and more tedious. 

An interesting limiting case can be derived from (III.28.8). 
Denoting the recurrence time by T we have 


(III.28.11) lim Prob {T > ua (exp u — 1)} = exp (— u) 


y> 
The proof consists in noting that 
Prob {T > ua (exp u — 1)} = 1 — o(ua (exp u — 1)) 
and then showing that 


lim |” exp (—su)do(ua1(exp u—1)) = (1 +s) 


powoJl0 


a exp (— su) exp (— u) du 


from which (III.28.11) follows. 
The intuitive meaning of (III.28.8) is quite obvious. 
We feel that 


Prob {T > ua (exp u — 1)} 
is approximately 
(III.28.12) P(0,0|0, Awa (exp u—1); 0, 24ua(exp w—1);...; 
0, nAua (exp u — 1)) + P(0, 0| 0, Auat (exp u — 1)) 


where ndu = u. 


III. PROBABILITY IN CLASSICAL STATISTICAL MECHANICS 159 


Now, for large u one again feels that the observations at times 
kAua (exp u — 1) 
are nearly independent so that (III.28.11) should be approximately 
[P(0, 0 | 0, Auat (exp u — 1))]"1 


which in the limit Au—>0, ndu = u becomes exp (—u). 

Although this argument is mathematically crude it is con- 
vincing, and it suggests a theorem of great generality to the effect 
that the recurrence time when measured on the scale of the mean 
recurrence time is exponentially distributed in the limit as the 
mean recurrence time becomes infinite. 

This is true for the Ehrenfest model and the exponential 
nature of the limiting distribution explains the already noted 
phenomenon that for large recurrence times the relative fluctua- 
tion of the recurrence time is approximately 1 (100 %). 

One final remark. If one considers a gas so dilute that colli- 
sions between molecules can be neglected (Knudsen gas) and if A 
is a convex set then we are led to a realistic persistent process n ,(t). 

Since the velocities in the gas obey the Maxwellian distribu- 
tion 


(III.28.13) m3(22kT)—2? exp [— mv + v(2kT)“] = f(v) 


it is easily seen that 


g(t) = |A] | dr | dvp(r + vi) fv), 


where y(r) is the characteristic function of the set A (see III.22.1). 

Formula (I1I.28.8) is applicable, and it is thus possible to 
determine, in principle, the distribution of time intervals between 
consecutive instants during which the set A is empty of gas par- 
ticles! 


CHAPTER IV 


Integration in Function Spaces 
and Some Applications 


1. So far the measure-theoretic background of the problems we 
have treated has been primitive and relatively trivial. We now 
come to considerations where the underlying measure theory is 
much more sophisticated. 

The starting point is a result, first derived by Einstein and 
Smoluchowski, to the effect that the probability that a free Brown- 
ian particle starting from x = 0 will be found between œ and f, at 
time 4, between & and fa at time ż, etc., and between «, and f, at 
time #, is given by the formula: 


(IV.1.1) fo f P00 | 15 4) Pe | a5 be h) 
PG g | Cy pty ~ta] 0G so AE, 

where 
(IV.1.2) O27, es Se ST, 
and 
(IV.1.3) P(e | y; t) = (2at)"t exp [— Hy — x)2t-] 
Actually, 

P(x | y; t) = $(aDt)-* exp [— dy — z) D7111] 


where D (“diffusion constant”) is related simply to the viscosity 
and temperature of the medium in which Brownian motion takes 
place, and to the radius of the Brownian particle (assumed spher- 
ical). The formula for D also contains the Avogadro number 
(through the intervention of Boltzmann’s constant k) and hence 
experiments on Brownian particles lead to a determination of this 
number. Indeed it was this possibility that made the Einstein- 
Smoluchowski theory so exciting to physicists. From the purely 
[161] 


162 PROBABILITY IN PHYSICAL SCIENCES 


mathematical point of view the physical meaning of D is not 
relevant and hence we have chosen units in which 


Dat 


One can now try to fit the theory of Brownian motion into 
the general scheme discussed in Section 1 of Chapter I. 

For the sample space S, we first take the set of all real-valued 
functions x(¢) (0 < ż < œ), normalized by the condition 2(0) = 0. 

For “elementary sets,” we take sets of functions defined by 
conditions of the form 


(VAA 4oy=< 2b) <Byssee Gy 2l,) <6}; OS ci <8, 


(Wiener calls such sets of functions ‘‘quasi-intervals.’’) 

The measure (Wiener measure) assigned to the set (IV.1.4) 
is given by the Einstein-Smoluchowski formula (IV.1.1). 

It is now easily verified that this particular assignment of 
measures to elementary sets satisfies the usual consistency con- 
ditions. 

In fact, it is seen that consistency conditions are implied by 
the relation 


(IV.1.6) P(w|y;t) =["~ P(w| zit) Pe lyst —t)dz, O<t<t 


Once consistency is established, by a general theorem of 
Kolmogoroff (mentioned briefly in Section 6 of Chapter III), one 
can construct a completely additive measure in the space of all real 
valued functions x(t), (x | 0) = 0). 

So constructed, this measure is nearly useless because many 
sets of direct relevance and interest are nonmeasurable. 

For instance, the set C of continuous functions turns out to 
be nonmeasurable. 

In fact, one can show that 


(IV.1.7) u*(C) = 1, HAC) =© 


where u* and uą denote the outer and the inner measures respecti- 
vely. 
It has been shown by Doob that if one restricts oneself to 


IV. INTEGRATION IN FUNCTION SPACES 163 


continuous functions and maintains the measures (IV.1.1) for the 
sets (IV.1.4) (but now only continuous functions are allowed in 
the sets (IV.1.4)) then a completely additive measure u with 


(IV.1.8) u(C) =] 


can be constructed. 


2. Without going into proofs of the preceding statements let us 
try to explain what is at issue. 

Starting with “elementary sets,” whose measures are pre- 
scribed in advance, we construct sets by taking unions (finite or 
denumerable) of elementary sets and complements of sets already 
constructed. 

In this way we obtain what is called a Borel field generated by 
the elementary sets. 

Now, by the obvious rules (see Section 1 of Chapter I), one 
can assign measures (extend the measure as one calls this opera- 
tion) to all sets of the Borel field. The crux of Kolmogoroff’s 
theorem is to show that this operation preserves consistency. In 
other words, one must show that any two distinct representations 
of a set in the field (representations which may involve denumer- 
able operations) lead to the same measure. 

Doob’s aforementioned result can be formulated as follows. 
Adjoin to the elementary sets the set C of continuous functions and 
postulate its measure to be 1. Construct again the Borel field and 
extend the measure as before. One gets this way a new measure 
on the set of all real-valued functions x(t), (x(0) = 0) which is 
entirely concentrated on the space C. 

The crux here is to show (and it is a lesser “crux” than that 
in Kolmogoroff’s proof) that u(C) = 1 is not inconsistent with the 
assignment of measures to “‘quasi-intervals” (IV.1.4). 

One can now forget about the space of allreal-valued functions 
and consider the foregoing measure as having been defined on C. 
This is the Wiener measure in the space of continuous functions 
x(t), normalized by x(0) = 0. 

Wiener, who first introduced this measure during the early 
twenties did it in a much less abstract way. 


29) 


164 PROBABILITY IN PHYSICAL SCIENCES 


He constructed an explicit mapping of C into the interval 
(0, 1) (actually into (0,1) minus a set of measure 0) such that 
“quasi-intervals’’ (IV.1.4) mapped into sets of ordinary Lebesgue 
measure given by (IV.1.1). 

Wiener’s method, although extremely appealing (especially 
to an analyst), has certain disadvantages (which we shall point out 
in the sequel) and is out of fashion today. 

However it should not be forgotten that it was Wiener who 
had the idea first, and, because of this, his contribution to the sub- 
ject is still the greatest. 


3. Since consistency of assignment of measures to “‘quasi-inter- 
vals” (IV.1.4) is implied by (IV.1.6) (the Chapman-Kolmogoroff 
equation to mathematicians, the Smoluchowski equation to 
physicists—the latter being more just on historical grounds), one 
can inquire into other solutions of this equation on which measures 
in appropriate function spaces can be constructed. 

Without going into details of the vast subject of classifying 
solutions of (IV.1.6), let us mention briefly an interesting class of 
solutions, namely 


(IV.3.1) P(x | y;#) 
==. (2a) {= exp [7&(y — x)| exp [— ¢E|*] d£, 0< a <2 


the so-called “stable densities of exponent «.” 

The case « = 2 corresponds to Brownian motion and has been 
discussed briefly in Sections 1 and 2. 

The cases « < 2 are interesting because instead of (IV.1.7) 
one gets here 
(IV.3.2) u*(C) = 0 
and hence there is no possibility of using (IV.3.1) as a basis of a 
measure in C. 

However, it was shown by P. Lévy (and later by Doob) that 
(IV.3.1) can be used to introduce a measure in the space 2 of left 


(or right) continuous functions having only discontinuities of the 
first kind. 


IV. INTEGRATION IN FUNCTION SPACES 165 


It does not seem easy to approach this case by the method of 
Wiener (i.e., exhibiting an explicit mapping into a set in which it 
is easy to construct a measure), and this is the main disadvantage 
of Wiener’s method. 


4. Onceacompletely additive measure in C has been constructed 
it is a matter of routine to define an integral (Wiener integral) 
which has all the basic properties of the Lebesgue integral.44 

We shall use the expectation symbol E for the integral and 
Prob for the measure. 

Let V (x) be an continuous function defined in (— œ, o0) and 
assume V(x) = 0. 

Consider now the Wiener integral 


(IV.4.1) E {exp | — f V (a(x) dr |) 


Does it exist? 

Since the integrand is bounded it suffices to show that it is 
measurable. 

Since 
(IV.4.2) [V (a(t) dr = lim tn X V (x (kin) 

i N—>OO k=1 

(remember that x(t) « C and that we have assumed that V (x) is 
continuous!), and since 


in! 5 V (x (Rin-)) 


k=1 
is clearly measurable the measurability of fó V(#(t)) dr and hence 
of 


exp |- f V (x(t)) dr | 
follows. 


44 There is a group of mathematicians who consider it a crime to 
introduce measure first and the integral next. Although this point is 
relatively minor and hardly worth a polemic, it may be pointed out that in 
probability theory, which is one of the main “customers” of the theory of 
measure and integration, the traditional and ‘“‘old-fashioned’’ order 
(measure first, integral next) is certainly the natural one. 


166 PROBABILITY IN PHYSICAL SCIENCES 


From (1V.4.2) and the theorem on bounded convergence it 
also follows that 


(IV.4.3)  E{exp[ — [f V(#(z)) dr || 
= a lexp |— in“ EV (x(n) ]} 


the existence of the limit being part of the assertion. 
On the other hand, it is clear from (IV.1.1) that 


(IV.4.4) E [exp | — tn-1 5 V(x (kin) || 


k=1 


ar oi exp| — ins V (a) | P(0)} 2,210) 2 a | ayn) 
k=1 


— 0O 


-P (tal Ca Malisa oe, 


and consequently we have the following conclusion: 
The limit 


+ 


+00 00 `- 
IV.4.5) lim ner exp| — tni $ V(x 
el EY a a 
POI aL ee ee) aasad Barl A M de ae, 


exists and is equal to 
(IV.4.6) E {exp| — | V(x(c)) dr |) 


The existence of the limit (IV.4.5) emerges here as a simple 
consequence of measurability of a certain functional, and we have 
here an example (primitive but important!) of analytic capabilities 
hidden in measure theory! * 


45 This argument is an analog of the familiar one by which the existence 
of the limit 


n 
lim ( x k-1 — log n) 
n—oo \‘k=1 


1 
is derived from the existence of the integral f ({-1 — [¢-1])d¢ (in the 


Riemann sense). 


IV. INTEGRATION IN FUNCTION SPACES 167 


One could try to define (IV.4.6) as the limit (IV.4.5) the 
procedure actually adopted by Feynman. 

The disadvantages of such an approach from the purely 
mathematical point of view are obvious, although it is appealing 
on formal grounds. 

Feynman writes the integral in (IV.4.5) in the form (x, = 0) 


which makes it obvious that the exponent is simply a discretization 


of 
[EEF + res} 


Instead of (IV.4.6) Feynman writes 


(IV.4.7) exp |- [ Z (=) + V(x(2))| ac d (path) 


In this form the symbolism is physically more appealing, since 
l (=) y 
> (J + V(x) 
is the Hamiltonian of a particle of mass 1 moving in a field of 
potential V (a). 


Actually Feynman in his approach to nonrelativistic quantum 
mechanics is led to considering 


(IV.4.8) Jexp [af t (=) =- V(x(r))| dr d (path) 
where 


h 
h= = (h = Planck’s constant) 
7 


and 


168 PROBABILITY IN PHYSICAL SCIENCES 


(IV.4.9) f Z (=) — V(x(z) | dr 


is the classical action along the path z(t). Because of 7 (= «/—1) 
in the exponent, Feynman’s theory is not easily made rigorous. 
On the other hand, Feynman integrals of the type (IV.4.7) are 
most conveniently handled when transformed into the form 


(IV.4.6). 


5. We shall show now that the evaluation of the Wiener integral 


(IV.5.1) E exp | — F V(x(r)) dr | 


can be reduced to solving a differential equation closely allied with 
Schrödinger’s equation. 

Nowadays there are many ways of establishing this connec- 
tion, and the one we have chosen is by far not the “‘slickest.” Its 
main advantage is that it requires least prior knowledge. 

We start with an additional restriction on V (x), namely that 
it also be bounded from above 

0SV(xz)<M 
Now 


exp | — [,V(@(2)) dr | = (—1} [ f Ye) dr |"/k! 
and since 


(IV.5.2) 0 < [Ý V (a(t) dx < Mi 


we have 


(IV.5.8) E exp | — F V(2()) dr || 


IV. INTEGRATION IN FUNCTION SPACES 169 


and let us calculate them for k = 1 and k = 2 in order to see what 
is going on. 
For k = 1 we have *6 


E| a 


= [, ELV (et )) jae =|. [ev ) (27r)? exp (— 4¢?/r) dé dr 


For k = 2, the calculation is slightly more complicated. 


E |( V (æ(1)) dx) | = 21E] | [P V(2()) V (2(r)) de, dr;) 
= 2! f [7° E{V (a(t) V (a(t) )} dr dra 


= 2 fhe [ho PO VG) YG) Gan) exp (— 24i/n) 
- [20 (t, — %)]-* exp [— (é — &1)?/(t2 — %)] dé dé,dr, dt, 


It is now perfectly clear how to proceed for general k. 
Let us define functions Q,„(x, t) as follows: 


(IV.5.5) Q,(a, t) = (2nt)~? exp (— 4a?/t) 


(IV.5.6) Quale, t) = | [7 Bat — r)? exp[—4(@ — £)? (t—1)] 
” V (£) Qn (£, T) dé, dt 
and observe that 


(IV.5.7) mlt) = k! [7 Ox (a, t) da 


Moreover, since we have assumed that 0 < V(x) < M it fol- 
lows by induction that 


(IV.5.8) 0<0,(z,t) < au Q (x, t) 
Set 
(IV.5.9) Q(z, 1) = È (=I) Onl 


46 We are appealing here to Fubini’s theorem. 


170 PROBABILITY IN PHYSICAL SCIENCES 


where because of (IV.5.8) the series converges for all x and? + 0. 
Clearly 


[Q (x, t)| < exp (Mt)Qo(x, t) 
and because of (IV.5.5) and (IV.5.6), Q satisfies the integral equa- 
tion 
(IV.5.10) Q(x, t) + (2a)? [f [7 @—1)-bexp [— $(a—€)?/(¢—-7)] 
"V (E) Q (E, t) dë dr = Qalx, t) 


It also follows by combining (IV.5.9), (IV.5.7), and (IV.5.3) 
that 


(IV.5.11) £E [exp | — ii V (x(t) dr] ) = [77 Qix, t) da 
In taking the expectation 
E | exp | — f V(a(c)) dx || 


we integrate over the whole of space C. 
If we integrate over the part of C defined by the condition 


a<a(t) <b 
we can use the self-explanatory symbol 
t 
E [exp |- h V (x(r)) dr |; a < xlt) < b) 
By minor modification of the argument we get 


(IV.5.12) £ [exp | — fi Væl) ) dr]; ETE loe, t) dx 


which implies, in particular, that 


(IV.5.13) O(xz,t) 2 0 
We can now remove the condition 
V(x) < M 


In fact, set 
E = V(x), if V(x) SM 
me) =) m, if Vie) > M 


and denote the corresponding Q function by Q% (a, t). 


IV. INTEGRATION IN FUNCTION SPACES 171 


By complete additivity of the Wiener measure we have 


lim E {exp [— ff Vaq(2(c)) de]; a < a(t) <b) 


= E [exp | — f V(æ(r))dr]; a < ælt) < b) 


and it follows easily from (IV.5.12) that as M—>œ the functions 
Q% (x,t) form a decreasing sequence. 
It further follows that 
lim Q™ (x, t) = Q(x, t) 
M—œ 
exists and that Q satisfies the basic integral equation (IV.5.10). 
The integral equation (IV.5.10) implies that Q satisfies the 
differential equation 


Q 12 


(IV.5.14) TT fae (x)Q 


and (IV.5.12) implies immediately the initial condition 


(IV.5.15) Q(x, t)>d(x), t-0 
i.e., 
lim ["" Q(x, t) dæ = 1 
too 7E 


Actually it is easier to take Laplace transforms of (IV.5.10) 
obtaining 


(IV.5.16) p(x) + (2s)? |" exp (— (2s)? |æ — £l) V (E) y(£) dé 
= (2s)~* exp (— (2s)? |e) 

where 

(TV .5.17) y(x) = Loe t) exp (— st)dt, s>0 

It is now a simple matter to see that 


(IV.5.18) sy” — (s+ V(x))y=0 


172 PROBABILITY IN PHYSICAL SCIENCES 


and that y(x) satisfies the following conditions: 


(a) y—>0, x>-00 
(b) y’ continuous except at x = 0 
(c) = p'(—0) — y'(+0) = 2 


Minor and obvious modifications must be introduced if V (x) 
is allowed to have a finite number of discontinuities. 


6. Since Q(z, t) is continuous in x (as clearly implied by (IV.5.10)) 
we have from (IV.5.12) 


(IV.6.1) lim —B {exp [— fve dr: a<a(t)<a+e| = Q(a, t) 


One defines the conditional expectation 


E [exp |- [, V (x(c)) dr | x(t) = a) 


as the limit 


lim E {exp | — Ma, e(r))dr|; a < ælt) <a +e) 


e—0 
~ Prob {a < x(t)<a+e} 
Since 
l 
lim — Prob {a < a(t) < a + e} = (2t)? exp (— 1a?/t) 
e—>Q € 


we get from (IV.6.1) 


(IV.6.2) (27t) žexp(—}a?/t)E lexp| — [ V(x) jdr | |w(t) =a} =Q(a,t) 


Suppose now that 
(IV.6.3) V (x)= œ, #>10 


Under this assumption it is well known that the eigenvalue 
problem 


(IV.6.4) ty” — V (x)y = — dy, yeL?(— o, œ) 
yields a discrete spectrum 
Aya TE 


IV. INTEGRATION IN FUNCTION SPACES 173 


with corresponding normalized eigenfunctions 


yil), palz)... 


It is also known that one can write Q in the following form 


(IV.6.5) Qla, t) = X exp (— 3t) yila) y,(0) 
Combining (IV.6.5) with (IV.6.2) we get 
(IV.6.6) (27t)? exp [— 4a?/t] E [exp |- f V(a(c)) dz | | x(t) =a] 


= > exp (— å;t) p; (a) p;(0) 


and by a slight extension 
(IV.6.7) (27t)? exp [— h(a — €)2/] 
-E lexp | — k V(E + 2(z)) dr | | z(t) = a — é| 


oo 


— =e (— A;t) yila) y; (E) 


What we have succeeded in doing is to express a purely 
classical quantity 


S exp (— At) y,(a) y;(ê) 


j=1 


in terms of an integral over a space of functions. 
What is remarkable is that important properties of 


> exp (— A,t) p)(a) v)(E) 


can be immediately derived from the probabilistic expression 
(IV.6.7). 
In fact, setting a = £ we get 


(IV.6.8)  (27t)-? E {exp | — FVE + 2(2)) dc] |2@) = 0) 
= > exp (— 4,t) y5 (é) 


174 PROBABILITY IN PHYSICAL SCIENCES 


and it is almost obvious (the proof is quite easy) that 


(IV.6.9) lim E {exp | — [.V(g+2(c) )dr| | a(t) =0}=1 


t—>0 


Thus for 70 
> exp (— 1;t) y3 (E) ~ (2nt)# 
and the classical Tauberian theorem implies that 
(IV.6.10) SY yr(E) ~ 24122, 2-00 
À;< À 
This is a highly nontrivial analytic result which emerges here 
as a consequence of the nearly obvious relation (IV.6.9). 


Integrating (IV.6.8) on € and using the fact that the y’s are 
normalized we get 


(IV.6.11) 3 exp (— 
(27t) pn E {exp | — Ma (E+a(t )) dz] | x(t) = 0} dé 


Now 
E {exp | — [v (E+ x(t (x)) dx | x(t) = 0} 


can be looked upon as an integral over the space of continuous 
paths such that 
x(0) = 0 and a(t) = 0 
For small ¢ it seems natural to approximate x(t) n0 St Si 
by 0. 
Thus one might expect that 


E{exp [| — [.V(é+2(t)) dc] |2@ = vi 
~ E {exp [— tV (&)] | x(t) = 0} = exp [— tV ()] 
and more precisely that 
rE {exp [— [VUE + e)a] |=) = 0} at 


(IV.6.12) 
~ |" exp [— tV (£)] dé 


IV. INTEGRATION IN FUNCTION SPACES 175 


provided, of course, V (x) grows sufficiently fast to insure the exist- 
ence for all ¢ > 0 of the integral on the right. 

A rigorous justification of ([V.6.12) has been given by Ray; it 
consists mainly in showing that the probability that z(t) is “large” 
somewhere in (0, ¢) is “small.” 

Thus for ‘+0 


E exp (— Aye) ~ (2at)# [7 exp [— tV (E)] d 


(IV.6.13) ees pes 
= (2a) [> [T> exp [—#(4y? + V(E))] dé dn 
Let 
N(a)=)> 1 
À< À 
and 


2 
B(A) = area of the region 5 +V(E) <a 
Then (IV.6.13) can be rewritten in the form 


K exp (— At) dN (2) ~ (2x)7} H exp (— dt)dB(A), t-0 


v 


and one might expect that, under appropriate Tauberian condi- 
tions one should have 


(IV.6.14) N(a) ~ (22)4B(A), >œ 


This is a result of fundamental importance in quantum mechanics 
and it also has received considerable attention in purely mathema- 
tical literature. 

In our formulation, the intuitive background of (IV.6.14) is 
brought out with extreme clarity and the rigorous treatment (as 
given by Ray) follows the lines suggested by the foregoing heuristic 
argument. 


7. A closely related application of ([V.6.11) is to the calculation 
of the partition function (quantum mechanical) 


(IV.7.1) Lie 5 exp (— BE,) 


n=1 


176 PROBABILITY IN PHYSICAL SCIENCES 


where 
B = (RT) (k=Boltzmann’s constant, T = absolute temperature) 


and the E, are the energy levels of a quantum-mechanical system. 

In the simplest (one-dimensional) case of a particle in a field 
of potential V, the energy levels are the eigenvalues of Schrédinger’s 
equation 


(IV.7.2) — 4km- = + V(x)y = Ey 


Setting 


we obtain 


which is (IV.6.4) with V(x) replaced by V(hm-ta). Thus, by 
(IV.6.11), 


Z = > exp (— BE,) 
= (276)? [77 E {exp [— [PV [hm-4(£ + z(1))] dr] | (8) = 0} d£ 
or by a simple change of variables 
(IV.7.3) Z = mth(2n8)* 
[E E{exp[— i V (E+ hmta(t)) dr] |x(p) = 0} dé 
One can now expand Z as a power series in 4, namely 
Z= Z, + Zk+ Zk+... 
as follows. Write 
JEV (E + hmae(t)) dr = pV (8) + hm V' (E) |f x(a) dr 


+ pV (Ehem ff al)de t... 


IV. INTEGRATION IN FUNCTION SPACES 177 


and hence 
exp [— J? V(E + borate) a 
ue x(t) dr 
+H (amave | [jete)ae]? — m=) fial) dx) — | 
We must now calculate 

E | f x(t) dr | (6) = 0) 

E |( ff x(c)dr)” | 2(8) = 0| 

E | ['a%(x) dr | (8) = 0) 
We carry out the computation of 

E| (ff a(x) dr)" | (8) = 0| 


leaving the other two to the reader. 
We have 


E | ( [fæle dr}? | 2(B) = 0) = 2B { f$ (ny) (eq) dry dre | (8) = 0) 
= =a * E {x(t )æ(T2) |æ (8) = O} dr, dt, 
and so we need only 
E {æ (tı) x(t.) | x(B) = 0}, O<USRSP 
By definition of conditional expectation 
E {æ(t1) e(t) | #(B) = 0) 
= a [ E {x (t,) x(t); —e<x(B) < e} — Prob {— e < x (f) < e}] 


= lim | ii ai Sf £ £a 27t] exp [— teit] 


e—>0 
e (290 (tT. — T%)]7 “2 exp [— (Xp — &)?/(t — %)] 
- [20 (f — t,)]-? exp [— $(% — x)?/ (B — t,)] dx, dx, dx, 


+ (2mp)-# [™ exp (— 423/6) ders | 


178 PROBABILITY IN PHYSICAL SCIENCES 


= (2x)? e le Xy ta EXP [— (f04/T, + lta — 2)*/(t. — %) 


+ 423/(B = Ta) ) | (270)~8 [ti (Ta — T4) ($ — Ta) |- da, dx, 


= Ti — Taf! 


and consequently 


El ( ["x(x) ar)” | 2(B) = o) z TAT — 1B) dr, dt, = B3/12 


Similarly we get 


Finally, 


Z = h mè (2af) | Í exp [— AV (é)]dé 


—& 


3 p-too 
+ Wm] E| VER exp [— pV) a8 


= Five exp [— av ()] a8) an ; 


+00 


= htm? (2nB)-* [| exp [— BV (é)] dé 


h2 63 m-1 pre 
-2EM S eRexp (avd —...| 


This expansion was first discovered by Wigner and Kirkwood. 


8. As a final illustration of the applicability of integration in 
function spaces we shall consider briefly a number of questions 
related to classical potential theory. 

So far we have dealt only with the measure in the space C of 
continuous functions x(t), (~(0) = 0). 


IV. INTEGRATION IN FUNCTION SPACES 179 


We shall now need a measure in the space of all continuous 
three-dimensional curves r(t) (r(0) = 0). 

Since r(r) is simply a triple (x(t), y(t), z(7)) it is natural to 
take as our measure the product measure in C x C x C. 

This way of introducing measure depends on the choice of a 
cartesian coordinate system, but due to the properties of the 
normal distribution our conclusions are independent of this choice. 

The measure is constructed in such a way that if 0 < 4 < t 
<... < t, and Qi, Q,..., 2, are Borel sets in three-dimensional 
Euclidean space R, then 


Prob {r (4) € Qi, r (ta) € Q,,..., r(é,) € 2,} 


= fa fa fa POITE A) Plt ruh h) 
Piia | Fns li = boa) dr; Ee dr 


n 


where 
(IV.8.2) P(r |@;¢) = (27t)? exp (— tlle — rli?) 


Here ||o — r|| denotes the Euclidean distance between r and ọ 
and dr the volume element of integration. 

It should be clear that we are dealing here with the simple 
(Einstein-Smoluchowski) theory of Brownian motion in three- 
dimensional Euclidean space. 

Let Q be a bounded closed region in R}, and let V (r) be its 
characteristic function, i.e., 


(IV.8.3) V(r) = | yee 


0, r¢@Q 
Consider now the functional 
(IV.8.4) Toly) = Í “V(y + r(r)) dr 


which represents the total time a Brownian curve y + r(t), 
starting from y, spends in Q. 
We have 


180 PROBABILITY IN PHYSICAL SCIENCES 


E{Tgly)} = E| [T V(y + r(z)) dr] = [7 E{ (y + r(x) } ae 
= Í ~ Prob fy--r(t)<Q}dr= ll of (2ar)texp(—}r]|r—y]]?)dr dr 
= (27) f lle — yil- dr < œ 


and consequently To(y) is finite for almost every path r(t). 
It is easy to calculate higher moments of To (y) and the result 
1S 
(IV.8.5) E{T%(y)} = k! (27) foe fo iayl liner = 
[|E — Ppal[ 7 ar... dr, 


Formula (IV.8.5) suggests that we consider the integral equa- 
tion 


(IV.8.6) (2x) f 4(g) |le—ril-*de = b(n), re 
The kernel 
(27 )~*||@ — r| 
is completely continuous and positive definite. Denoting by 
Aji Ao eas 
its eigenvalues and by 


$1(@), P2(@),-- - 


the corresponding normalized eigenfunctions we obtain 


SETH} = ÈA | bedea] le—vil#(e)de, +21 


Introducing, with u = 0, the expression 


(IV.8.7) h(y; u) = E {exp [~ uT9(y)]} 
we obtain quite easily 
(IV.8.8) 


hyu) =1— Zull +au) f gile) dem f,\le—vll-4,(@) de 


IV. INTEGRATION IN FUNCTION SPACES 181 


If ye Q, (IV.8.8) assumes the simpler form 


AV.8.9) Mysu)=1— EAU +4) f gledo) ye 


If y is in the interior of 2 then Ty(y) > 0 for every path r(t), 
and consequently 


lim h(y; u) = lim E {exp (— uTg(y)} = 0 


u } œœ u too 


Thus 


(IV.8.10) lim 3 4,(8 + 4) f dled d(y) = 1 

0 j= 
for every yin the interior of Q. This analytic conclusion expressing 
the fact that the Fourier expansion of 1 with respect to the eigen- 
functions ¢; is summable to 1 by a certain method of summability, 
emerges here as a consequence of the trivial observation that 


T gly) > 0 


A more refined argument will show that, if Q satisfies certain 
regularity conditions (e.g., it is locally star-shaped), then for 


y¢2 
lim (1 — h(y; u)) = lim (1 — E{exp [— uTa(y)]}) 


ut oo ut oo 


= Prob {T (y) > 0} = U (y) 


where U (y) is the capacitory potential of Q, i.e., the harmonic 
function which vanishes at infinity and approaches I as y ap- 
proaches a regular (in the sense of potential theory) boundary point. 
Moreover (IV.8.10) is valid for every regular point of the 
boundary, while at an irregular point the limit, while it still exists, 
is less than 1. 
We are thus led to an explicit formula 


AV.8.11) Uly)=limS +4) ¢,(ede—| le—yl 4l) 


$40 j=1 


182 PROBABILITY IN PHYSICAL SCIENCES 


for the capacitory potential of a region subject to mild regularity 
conditions. 

Although formula (IV.8.11) is purely “‘classical’’ in appear- 
ance it was first discovered and proved by probabilistic methods 
exhibiting once more the usefulness and power of these methods. 


APPENDIX I 


The Boltzmann Equation 


By G. E. UHLENBECK 


Part 1 


1. Introduction. The general theme of these two lectures will be 
the mathematical structure of the theories describing the so-called 
irreversible processes like heat conduction (equalization of tem- 
perature differences), viscosity (equalization of velocity differ- 
ences), etc. We know that practically all processes in nature have a 
definite tendency to go to an equilibrium state (or state of maxi- 
mum entropy), and the problem is to describe this approach from 
the molecular point of view, assuming that the structure of the 
molecules and the laws of their interaction are known. 

Since this field, after the basic work of Boltzmann, Maxwell, 
Chapman, Enskog, and others, has only recently been actively 
developed again, only the outline of the mathematical structure 
has become visible, and very few concrete results have been ob- 
tained. Because of this, and because of the lack of time, I can only 
try to present the general ideas, and state the mathematical prob- 
lems. I think they are of mathematical interest, since they are a 
kind of generalization of the ergodic theory, and apparently lead 
very soon to real terra incognita. 


2. The General Problem. We will consider only the simplest type 
of molecular system: N point molecules in a vessel (volume V) re- 
pelling each other by a known monotonic central force potential 
¢(7;;) between each pair (7,7), which has a finite range 7), so that 
¢(0)>-+ œ, and d(7,) = 0. And we shall use classical mechanics. 
The quantum mechanics brings in additional features and perhaps 
simplifications, but the basic problems are in my opinion essentially 
the same. 
[183] 


184 PROBABILITY IN PHYSICAL SCIENCES 


The motion of the molecules can be represented by the motion 
of one point, the J-point, in the 6N-dimensional phase space 
(I’-space) on the energy surface E(x,...2y) = const. (Notation: 
x£; = (9;, p;) = coordinates and momenta of the tth particle.) 
One knows, since Poincaré, that this motion is quasi-periodic; 
starting from a finite region on the surface, the point will return to 
the region after a Poincaré cycle. This is essentially a consequence 
of the reversibility in time of the mechanical equations of motion. 
Historically it has lead to the famous Boltzmann-Zermelo contro- 
versy,! and to the erroneous conclusion that therefore it is impos- 
sible to explain the irreversible processes by reversible mechanical 
models. The reason why Poincaré’s theorem is not relevant for 
the theory of irreversible processes is twofold: 

(a) Since N is very large, the recurrence time for a fixed 
“small” initial region can be estimated to become enormously long, 
very much longer than the observed equalization or relaxation 
time. That a bounded mechanical model has a recurrence time To 
is therefore of no interest since we want to follow the system only 
over times t < Tə- 

(b) The initial state is never completely specified, since any 
macroscopic measurement gives only information about averages 
over large groups of molecules. Therefore we must not consider 
one mechanical system, but, in the language of Gibbs, an ensemble 
of identical systems differing only in their initial phase and one 
must follow the streaming of the “ensemble fluid” in time. Or in 
other words, one must consider a probability distribution 


Dy(a,...2y,t) in I-space and follow its development in time. 
One knows that this is determined by the Liouville theorem: 
oDy 


where H is the Hamilton function, in our case: 


1 See: P. and T. Ehrenfest, Grundlagen der Statistischen Mechanik, 
Enz. der Math. Wiss., Vol. IV, Art. 32.; this is also discussed in the book of 
D. ter Haar, Introduction to Statistical Mechanics. 


APPENDIX I. G. E. UHLENBECK 185 


(2) H = $ (#}/2m + Usa) + Z glr) 


i<j 


(U ,(q;) = outside force, and includes the “wall potential’ pro- 
duced by the walls of the vessel and r; = |q; — q,| = distance 
between molecules 7 and 7), and the curly brackets denote the 
Poisson bracket: 


3) ie, Dp s e 


The Liouville theorem is an immediate consequence of the equa- 
tions of motion and allows one in principle to find D,y(¢), if the 
initial distribution Dy(#,...%y,0) is given. All observable, 
macroscopic properties of the system can be expressed as appro- 
priate average values of the distribution Dy. One may expect, 
according to the ergodic theory, that any initial distribution will 
in the course of time approach an equilibrium distribution, which 
is in our case the so-called microcanonical distribution, that is the 
uniform distribution between two neighboring energy surfaces E 
and E + AE.? Itisthis approach which is of interest for the theory 
of irreversible processes; it is not in conflict (and has little to do) 
with the quasi-periodic motion of each point of the ensemble. 
The general problem is to find the appropriate laws for the 
approach to equilibrium. I say appropriate, because one does not 
want to follow the change of Dy in all details. This would involve 
the precise integration of the equations of motion, clearly an im- 
possible task and also of no interest, since only the change of some 
averages (or macroscopic quantities) with time are required. 
One may ask, how the initial distribution D,y(0) should be 
chosen so as to conform with our initial macroscopic knowledge of 
the system. The theory does not give a general recipe for this 


2 This approach is meant in the coarse-grained sense. See P. and T. 
Ehrenfest, Joc. cit.; only in this sense can a precise meaning be given to the 
discussion in Chapter 12 of Gibbs’ book. Cf. also Tolman, Statistical 
Mechanics, especially for the analogous development in quantum statistics. 


186 PROBABILITY IN PHYSICAL SCIENCES 


choice, and I thought for a long time that this was an essential gap 
in the theory. I do not think so any more for reasons which I will 
come back to in the second part. 


3. The Comparison with the Theory of Stochastic Processes. The 
combination of a quasi-periodic motion of the individual members 
of an ensemble, witha monotonic approach of the initial probability 
distribution to an equilibrium distribution, is a familiar feature of 
the theory of stationary Markoff processes.? 

Let me assume first that all variables are discrete; then, let X 
be a stochastic variable (or set of variables) which can assume the 
discrete set of values X,; let the observations be made at the 
discrete time points ¢,= st. If P(X,|X;,,s) is the conditional 
probability that at time st, X = X, if at time zero X = X,, then 
for a Markoff process P fulfills the so-called Smoluchowski equation 


(4) P(X,, s) = 2 P(Xr s — 1) P(X: | X4, 1) 


where we have suppressed the initial value X,. Call P(X,| X;, 1) 
= Q(X}, X,;) = transition probability, then since 


À Q (Xr, X;) ch 
one can rewrite (4) in the form: 
(5) P(X,,s) — P(X,,s —1) = X [P (X; s — I) Q(x, X;) 
k 

— P(X,,8 — 1) Q(X;, X;)] 
where the prime means that the term k = í must be omitted. 
Equation (5) has a simple interpretation: the rate of change of P 
with time is equal to the “gains” (due to transitions X,—>X, minus 


the “losses” (due to transitions X ,>X;,,). If all variables are conti- 
nuous (5) would become: 


o , 
(6) SE L fay [PY, new, X) — PIX, NOW Y) 


3 See W. Feller, Introduction to the Theory of Probability; Ming Chen 
Wang and G. E. Uhlenbeck, The Theory of Brownian Motion II, Rev. Mod. 
Phys., 17, 323 (1945); M. Kac, Amer. Math. Monthly, 54, 369 (1947). 


APPENDIX I. G. E. UHLENBECK 187 


One knows, that under quite general assumptions for Q, 
equation (4) or (5) has as consequence that P(X,,s) approaches 
monotonically for s—œ an “equilibrium” distribution W(X,), 
although each series of observations X,, X,,... will show a quasi- 
periodic behavior and therefore will not show the “arrow of time.” 
Presumably the same is true for the continuous case, described 
by (6). 

The relation between the motion of the gas and a random pro- 
cess becomes plausible, if one tries to visualize the motion of the 
T-point on the energy surface in [-space. Clearly the shape of this 
surface is a kind of hypercylinder; omitting for a moment the inter- 
molecular forces, the surface E = constant will be spherical in the 
impulse directions, and cylindrical in the coordinate directions 
which are bounded by the volume V. If the gas is dilute (73 < V) 
then the intermolecular forces will make sparsely distributed, 
narrow and deep valleys in this hypercylinder. The motion of the 
T-point as long as no collision occurs will be in a straight line, while 
after any collision it will very quickly jump to another spot, move 
in a straight line, jump again, and so on. The motion resembles 
therefore a kind of random walk on the hypercylinder, and one 
may expect therefore that the change of the distribution Dy (X, é), 


with X = (£i, ta, . . ., &y), resembles a Markoff process since the 
random walk problem is so to say the classical example of a Markoff 
process. 


Starting from the random walk picture, Kac * has proposed for 
the change of Dy (X, t) with time, instead of the Liouville equation, 
an equation similar to equation (6). Kac made two simplifying 
assumptions: (1) Dy depends only on the momenta of the particles, 
so that X = (p,, Pe,..-, Py), Which makes the motion of the T- 
point resemble closely the random walk on a hypersphere; (2) the 
gas is very dilute, so that only binary collisions need to be con- 
sidered. The transition probability for the random steps can be 
determined from the dynamics of such a binary collision. 

Under these assumptions, I have no doubt that the equation 

4 M. Kac, Proceedings of the Third Berkeley Conference on Mathematical 
Statistics and Probability, Vol. III, 1956, pp. 171-197. 


188 PROBABILITY IN PHYSICAL SCIENCES 


of Kac (nowadays called the master equation) describes the ap- 
proach to the microcanonical ensemble in a completely satisfactory 
way. The precise discussion leads to a number of interesting 
mathematical problems, especially if one wants to know the be- 
havior for N very large. From the physical point of view the 
simplifying assumptions are unfortunately rather restrictive and it 
is not clear how one should remove them. 

There remains the basic problem in which sense the master 
equation approximates the Liouville equation. Brout 5 has made 
a valiant attempt to elucidate this point, but a satisfactory answer 
is, I think, still lacking. 


4. The B-B-G-K-Y Equations and the General Conservation 
Laws. A second line of development starts from the idea that in 
most cases the macroscopic quantities in which one is interested 
depend not on the complete distribution function Dy (a... xy, £) 
in I’-space, but on the probability that a single particle is in a 
certain phase range dp dq, irrespective of the phases of the other 
particles. I will call this the w-space distribution. Clearly this can 
be obtained by integrating Dy over all x; except one. Note that 
Dy must be a symmetric function of 4,,%,...,%y, since all 
particles are indistinguishable, so that it does not matter which z, 
is kept fixed. This symmetry requirement of Dy is clearly un- 
affected by the Liouville equation, since H is symmetric. 

By integrating the Liouville equation one obtains a hierarchy 
of equations, derived independently and about simultaneously by 
Bogoliubov, Born and H. S. Green, Kirkwood and Yvon,’ and 
which I will call therefore the B-B-G-K-Y equations. Intro- 
duce the partial distribution functions: 


(7) a her en ee . { Dy ditgyy - - . dey 


5 R. Brout, Physica, 22, 509 (1956). 

6 N. N. Bogoliubov, J. Phys., 10, 265 (1946). This is an excerpt from 
his book Problemy Dinamicheskot Teorii v Statisticheskot Fizike, Moscow, 
1946. The papers of M. Born and H. S. Green are collected in a book 
called Kinetic Theory of Fluids. J. Kirkwood, J. Chem. Phys., 14, 180 
(1946); 15, 72 (1947). 


APPENDIX I. G. E. UHLENBECK 189 


Integrating the Liouville equation over £, - - - &y, one finds, if s 
is fixed, in the limit N>oo, V>o, V/N = v fixed: 


OF, 1 s 
(8) T ={H,, Dar =| dena (Se s+1? F val 


where H, is the Hamilton function for the s-tuple of particles: 


(9) H, = © (Pi/2m+Uq))+ È Pas 

i=1 1si<jss 
U (q;) is the potential of the outside force, with the wall potential 
excluded, and 


pa = o(la, — d;l) 


Equation (8) is the B-B-G-K-Y equations. The proof is 
straightforward and will be omitted. Of special interest is the 
case s = 1, which gives the change of the w-space distribution; 
written out in more detail it becomes: 


OF 
i a oe 
i 1 ) OF i) 
Js GP; qı Pı. 
j tfan faa? Miet qı 2 > ı Pı 
Pa 
K = — V,U = outside force. The first two terms on the right 


side describe the change of F, due to the streaming in the (six- 
dimensional) u-space; brought to the left-hand side they combine 
with ðF,/ðt to form the total or substantial time derivative 
DF,/Dt, which is the change of F} if one moves with the particle in 
the u-space element dp dq. The last term in (10) describes the 
change of F; due to the interaction with other molecules; it there- 
fore involves the pair distribution function Fy. 

The usual macroscopic variables describing the state of the 
gas are obtained from F, (and F,) by the further averaging over 
the impulse variable.” For instance the density p is given by: 


7 The best treatment of this part of the theory is by J. Kirkwood, 
J. Chem. Phys., 14, 180 (1946). 


190 PROBABILITY IN PHYSICAL SCIENCES 


q, = y=" fF (q, p.t) 


The average velocity: 


1N 
u(q, t) = — 7 pF, dp 
and by integrating (10) one obtains the continuity equation (con- 
servation of mass): 
Op 

11 — + di = 
(11) E + div (pu) 
Multiplying (10) first with p and then integrating over p, one 
obtains after some manipulation the equations of motion (conser- 
vation of momentum) in the form: 


Du, ð l 
(12) p= (= = “tu: Vu) = pK, e 


The stress tensor P,,1s the sum of two parts; the first part depends 
only on F, and is defined by: 
Nm 
P9 = = |v. U; Fy dp 
where U, = 1/m(p; — mu,) = thermal velocity; the second part 


depends on the intermolecular potential and on the pair distribu- 
tion function F,; it is given by: 


pe) = - dr a dp dp, F 
ü = —~ 5772 ee p dp, F(q, r, p, py, £) 


where r = q, — q. The total stress tensor is clearly symmetric. 
Finally one can derive from (10) an equation expressing the 
conservation of energy; it has the usual form: 


D 


(13) Pa, 


(5) + div q = — Pas Dag 


The right-hand side is the work done by the stress tensor; it 


APPENDIX I. G. E. UHLENBECK 191 


involves the rate of strain tensor: 


l (ðu; a 
D;; = — be + =") 
2\0q; 0q; 
The left-hand side gives the increase in the internal energy, in- 
volving: 


N 
0= | fap . tmU? F(q, p, ż) 


N2 
+ oy fap fen, 24, ¢(l1g—41) Fa(qqı PP,, 2) 


and the loss due to the heat current density q, which just as P,, and 
Q consists of two parts, one due to the purely kinetic motion (in- 
volving F,) and one due to the intermolecular forces involving Fy. 
I omit the detailed formula. 

It should be emphasized that all these derivations are purely 
formal. They are rigorous consequences of the Liouville equations 
in the limit N->oo, V->oo, N/V finite, and involve only integra- 
tions and the interpretation of the appropriate average values. 
No assumptions are made. As a result they form a general, but 
empty, scheme. The scheme is empty, because the system of equa- 
tions is not closed. Equation (10) is not an equation for F alone, 
but involves F,. The equations (11), (12), and (13) are not a set 
of equations of motion for the macroscopic quantities p,u, and Q 
since they involve P, and q, of which the dependence on the 
macroscopic variables is not known. 

Clearly, one can not become wiser by just doing integrals! 
Further assumptions must be made; one must somehow express 
the idea that the actually observed macroscopic changes of state 
are the same for the overwhelming majority of the members of the 
ensemble, so that the average of some quantity over the ensemble 
will at any time represent the change with time of this quantity 
for the single system. 


5. The Bolizmann Equation and the Hydrodynamical Equations. 
Closed equations for the change of the w-space distribution and of 


192 PROBABILITY IN PHYSICAL SCIENCES 


the macroscopic variables are of course known, but they were 
derived from intuitive or phenomenological grounds. The equa- 
tion describing the change of F} is the famous Boltzmann equation, 
which is the basis of the kinetic theory of dilute gases. To write it 
in the usual form,’ we will use velocities instead of impulses, and 
put: 


N 
f(t, x, 5) =z F6 a p) 


== number of particles in the u-space element dx dě 


Then the Boltzmann — is: 


D o 
as) Pea t ae HXi = fag f Sear NER — th 


The left-hand side is D as in nae (10) (note that X, 1s now 
the acceleration K,/m); the difference with (10) lies in the collision 
term. In here the prime and the index 1 of the /’s refer to the 
velocity variable alone so that for instance fi = f(t, x, §,) etc. 
The four velocity variables refer to the velocities of the binary 
collision 


(E, &) <> (8, &); g = [E — £l = |" — $| = relative velocity 


which in a collision turns over the angle 6 (see Fig. 1). The two 


LN J db oe cemaien 
Action ALS 


sphere 
Restituting 


collision Dó 


-b 


Figure 1. 


8 Boltzmann, Vorlesungen über Gastheorie, Vol. I. See also Chapman 
and Cowling, Mathematical Theory of Non-uniform Gases. 


APPENDIX I. G. E. UHLENBECK 193 


terms on the right-hand side are the gains due to collisions 
(é’, £) (£, é), the restituting collisions, and the losses due to the 
direct collisions (£, €,)>(é’, é). To compute the number of direct 
collisions, consider the relative motion. All molecules in velocity 
range dé, falling in with a “collision parameter” between b and 
b + db, and which lie in the little cylinder: gb db de dt just outside 
the action sphere, (e is a polar angle), will in the next dt collide, 
with a deflection of g over 6. The dependence between b, 0, and g 
is determined by the dynamics of the collision, that is to say by the 
force law. Boltzmann now puts for the number of direct collisions 
per cc. and per sec.: 


(15) gb db de ff, d& d&, 


that is, he assumes that the number is proportional to the product 
of the numbers of collision partners. This “‘Stosszah] Ansatz” ° is 
certainly plausible, but of course it is not derived from the basic 
mechanics of the model. Analogously one computes the number 
of restituting collisions. In (14) one finally has written: 


b db de = I (g, 0)d® = differential collision cross-section for a 
collision in the solid angle dQ = sin 0 de d0. 


The Boltzmann equation has often been criticized. Especially 
doubtful seems the fact that in (15) no correlation of the velocity 
directions of the two collision partners has been taken into account, 
although forces are present, so that not F, appears but the product 
of two Frs. Clearly, Boltzmann had some sort of successive 
approximation scheme in mind. In zero approximation when the 
intermolecular forces are neglected, the state of the gas 7s described 
by f(t, x, v); the collisions depending on pairs of molecules are the 
next approximation and can therefore still be described by f. 
Only if the density is so high that triple and higher collisions occur 
with an appreciable rate, the correlation of the velocities will have 
to be taken into account. 

The limitation of the validity of the Boltzmann equation is 


® See P. and T. Ehrenfest, loc. cit. 


194 PROBABILITY IN PHYSICAL SCIENCES 


therefore that the density of the gas must not be too high. Also 
one may doubt if phenomena which vary very quickly in space or 
in time can still be described by the equation. I hope to come back 
to this point in the second lecture. 

A closed set of equations describing the change of the macro- 
scopic variables are the well-known hydrodynamical equations of 
Navier-Stokes. They have the same form as the general transport 
equations (11), (12), and (13), except that now one has in addition 
equations expressing the stress tensor P, and the heat current 
density g in terms of the macroscopic variables. These equations 
are: 


(16) Pa = 26:5 — 2H(Diy — Daa Ses) H PD aa 943 


aa tj 


q = — å grad T 


involving the two viscosity coefficients u and v, and the heat con- 
duction coefficient A, which are characteristic for the gas, and may 
be functions of p and T, which have to be determined by experi- 
ment. Together with the equation of state #(p, T), and the ther- 
mal equation of state Q(p, T), one then has five equations for the 
five macroscopic variables p,u, and T. 


6. Summary. The relations of the various descriptions of the 
state of the gas, can be summarized by the following scheme: 


(ee Mechanical process Irreversible process 
| Master seemne 
T-space Liouville 
wa (Description by a 
description equation 
stochastic process 
| B-B-G-K-Y | A Y | Bolt-imani egna- 
-space i i l 
e hierarchy of —...—»...——| tion for low density 
description ; hae 
equations + generalizations 


Y 


Transport equa- Hydrodynamical 
Macroscopic tions (general .++—>-.-. | equations; Euler, 
description conservation laws) Stokes-Navier 


APPENDIX I. G. E. UHLENBECK 195 


Here at the left-hand side the drawn arrows indicate rigorous 
derivations: the dotted arrows indicate relations, which are ap- 
proximate and in which further assumptions have to be made. 
Each dotted arrow really represents a whole theory to which at 
least one lecture should be devoted, especially since most of these 
relations are still far from clear! There is however, I believe, a 
general point of view, due to Bogoliubov, which I will try to 
sketch in the next part. 


Pari 2 


7. Theldeas of Bogoliubov. The main problem is the elucidation 
of the relation between the right- and left-hand side of our general 
scheme. Since the development in time is our main concern, it is 
important first to discuss which times enter into the problem. 
Clearly, from the model, the following times occur, arranged 
according to the order of magnitude: 

(a) the molecular interaction time, or time of a collision: 


(1) T= Yol Vav 
where v,, is an average velocity. 
(b) the mean free time, or time between collisions: 


(2) Lo = MVan 
where A is a mean free path. For gases of moderate density 
ty > t (Say 10? or 10° times larger). For liquids ż and t become of 
the same order of magnitude. 

(c) the macroscopic relaxation time, 


(3) Oo = Lives 


Here L is a macroscopic length, for instance the distance of parallel 
plates between which the gas flows or conducts heat, or the wave- 
length of sound which propagates through the gas, etc. In general 
terms L is a distance over which one has an appreciable change of 
any of the macroscopic quantities p,u, or 7. For the usual, slowly 
varying phenomena 6,>> tọ; in exceptional cases (very high- 
frequency sound, shockwaves) 8, can become of the same order as 
l: 


196 PROBABILITY IN PHYSICAL SCIENCES 


Let us assume the case of a moderately dense gas with slowly 
varying macroscopic quantities, so that 6, >> %>> t. Physically 
one can expect that the temporal development of Dy will then take 
place in various stages. 


a. The Imtial Stage. From the arbitrary initial distribution 
Dy (£i, £as... Zy, 0), one can expect that all distribution functions 
F(a, ...%,, t) will vary very quickly during a time of the order z, 
when s 2 2. How they vary, will depend on the initial situation, 
and therefore cannot be described in a general way. Only the 
first distribution function F(x, t) does not involve directly the 
repulsive intermolecular interactions and therefore will vary slowly. 


b. The Kinetic Stage. After a time of order t one can expect 
that the temporal development will be governed by the change in 
time of F}. Since in equilibrium the higher distribution functions 
F(s = 2) depend on Fy, (since then F, = F,(x,)... F,(x,) 
-exp (— $ > 4(7,,;)) at least in first approximation), one can ex- 
pect that F, will have the form: 

(4) Pl as 2 t) We me na Tr 

in which the whole time dependence sits in F}, whatever the initial 
distribution is. (Equation (4) is therefore not the trivial state- 
ment that one always may use F} instead of t, since then F, would 
also depend on the initial distribution of F,.) The equation for F, 
can be expected to be of the form: 


F 
(5) a = AEF) 
(kinetic equation or general Boltzmann equation), and this equa- 
tion then describes the whole temporal development of the state 
of the gas. 

After the time t one gets therefore a kind of contraction or 
Shortening of the description of the state of the gas, in which the 
temporal development will be determined by much fewer variables 
(namely F; instead of all the functions F,). 

One can also say that in the time 7 a first smoothing or chao- 
tization takes place in which the detailed initial information is lost. 


APPENDIX I. G. E. UHLENBECK 197 


c. The Hydrodynamcal Stage. After a time of order t a 
second smoothing or chaotization takes place. Very quickly 
(namely a few collisions, therefore in a time of order ¢,) the equi- 
librium in the velocities gets nearly established, and the further 
temporal development will be governed by the change in time of 
the macroscopic variables p,u, and T (or internal energy Q). The 
distribution function F} will now be of the form (since in equilib- 
rium it has this form): 

(6) F(a p, t) = F(a, p;p, u, T) 

in which the whole time dependence sits in p, u, T, whatever the 
initial distribution of F} is. Analogous to (5), the equations for 
p, u, T can be expected to be of the form: 


Op 
£ — R(q; T 
at (q; Po u, ) 
ou 
7 — = : T 
(7) z U(q; p, u, T) 
or 
at O(q; p> u, ) 


which are the general hydrodynamical equations, which now 
describe the further temporal development of the state of the gas. 
This is a second contraction or shortening of the description; p, u, T 
are the first five moments of F; with regard to p, and the temporal 
development is therefore now determined by still fewer variables. 

Perhaps there is a further stage in the temporal development 
towards equilibrium, namely the turbulent stage. The treatment of 
Taylor, Karman-Howard, and others of isotropic turbulence by 
averaging the Stokes-Navier equations, is reminiscent of the deri- 
vation of the B-B-G-K-Y equations from the Liouville 
equation. One obtains again a hierarchy of equations involving 
successively the pair, triple, and quadruple correlations of the 
velocities. Again, clearly further assumptions must be made. 
From the point of view of Bogoliubov, one first has to find the 
different times or lengths involved. If it is possible to split 0, (or 
L) up in two or more times (or lengths), then one may expect that 


198 PROBABILITY IN PHYSICAL SCIENCES 


a further contraction of the description is possible. Perhaps in 
isotropic turbulence L can be split in the mesh-width of the mesh 
producing the turbulence, and in the correlation length, which is 
about a factor ten smaller. Perhaps then the single function which 
describes the temporal development is the pair distribution func- 
tion, and all higher distributions are functionally dependent on the 
pair distribution. Perhaps the recent theory of Chandrasekhar! 
can be looked upon as a first step in this general direction. Note 
that I have started almost every sentence with the word perhaps! 
All this is speculation, and I know nothing more than I said. 

Reference should also be made to the recent work of L. van 
Hove," and W. Kohn and J. M. Luttinger!? in quantum statistical 
mechanics. These authors have for the first time derived in a 
satisfactory way the so-called transport equation (which in our 
terminology is the quantum analog of the master equation) from 
the quantum Liouville equation. An essential feature of their 
derivation is that the off-diagonal terms of the density matrix 
(which vary quickly in time) are expressed in the slowly varying 
diagonal terms. Only these diagonal terms enter in the transport 
or master equation, so that again a contraction of the description 
of the state of the system is obtained. 

It seems likely that such successive contractions of the de- 
scription are an essential feature of the theory of irreversible pro- 
cesses. Of course that such a contraction is possible, must be a 
property of the basic equations of the system. Besides the exist- 
ence of a series of relaxation times (like z, t and 6,) or relaxation 
lengths, the basic equations must have the property that their 
solutions for practically all initial conditions show a kind of piece- 
wise ergodic behavior in time. The solutions, starting from any 
initial condition, must crowd together to a curve (form a ‘‘Ver- 
dichtungs-Kurve’’), which is described by fewer variables and by 
the next set of equations, and so on. Clearly, there are therefore 
two basic problems: 

10 S, Chandrasekhar, Proc. Roy. Soc. (London), A 229, 1 (1955). 


u L. van Hove, Physica, 21, 517 (1955); 23, 441 (1957). 
122 W. Kohnand J.M. Luttinger, Phys. Rev., 108, 590( 1957); 109, 1892( 1958). 


APPENDIX I. G. E. UHLENBECK 199 


1. Show that the solutions of the basic equations (the Liou- 
vile or B-B-G-K-Y equations in the kinetic stage, the 
Boltzmann equation in the hydrodynamic stage) have such a piece- 
wise ergodic behavior in time. Usually the physicist will assume 
this, and he will be more interested in the second problem: 

2. Find the variables and the equations which describe the 
“Verdichtungs-Kurve”’ or the secular behavior of the solutions of 
the basic equations. I will try to sketch how Bogoliubov proposes 
to solve this problem. 


8. The Kinetic Stage. An important feature of the method of 
Bogoliubov is that the contraction of the equations is coupled with 
a successive approximation procedure or power series development. 
In the kinetic stage this development is the well-known virial 
development in powers of the density 1/v. One writes equations 
(5) and (6) in the form: 
a A,(«; Fi) + 074A (x; Fy) + vA lx; Fy) +... 
(9) Fi oe a £t) 

= F(x... æ F+ o FP Oe eg At... (s22) 
Such a development really amounts to a development in powers of 
t/t); however, it is simpler to use 1/v. Compare now (8) and (9) 
with the B-B-G-K-Y equations: 


oO = Hy FAA fesa | È basie Foal 


and take pie s = 1. Using the expansion (9) for F,, one gets: 


OF, 
OL a ae 2 So | de {hi FS} 
Comparing with (8) and equating equal powers of 1/v, one finds: 
Pia OF; OF, 
Aln; Fy) = {H,, Fy} = —— —-_ K, 
o(t; Fy) = {Hy, Fy} m oor. ap, 
on Ailt; F = | die{br, F$ P (æ, £a; Fy)} 


A,(%; F = | dæsfha FẸ Nia ty; Fy)} 


200 PROBABILITY IN PHYSICAL SCIENCES 


etc. A, is therefore the streaming term, while the higher A’s 
contain the pair distribution function in a one step lower approxi- 
mation. Next, consider the B-B-G-K-Y equations for s 2 2. 
The time derivative oF ,/0t must now be replaced by: 


OF, oF, dF, 
ot ôF, ôt 


where ôF,/ôF; is the functional derivative after F,. Replacing 
OF ,/ot by (8), introducing for the F, the expansion (9) and equat- 
ing equal powers of 1/v, one obtains: 


{H,, FO} — Dy FY = 


0 
(11) (1) (1) (0) ` (0) 
JH ay F$ }— DoF; = D F; + J ieoa È bismo FS) 


etc., where the operator D, acting on any functional y(x; ...2,; Fi) 


is defined by: 


g? 


ÖY l 
D,Y = op Ar Fy) 
It is now clear, how the successive approximations are calculated. 
The first equation of (11) involves only F®; it contains Ap, which 
is known. Solving for F\’, one finds from the second equation of 
(10) A,(x; Fi). This determines completely the right-hand side 
of the second equation of (11), from which then F") can be com- 
puted. This in turn gives A,(x; F,), etc. 
All this can be carried out at least in a formal way. To give 
an idea of the type of results, let me only mention that one finds 
in first approximation: 


(12) da Ce r = [[ F ( a P t) 
{=l 


In here P® is the initial impulse of the 7th particle in the s-tuple 
collision (governed by H,), which leads to the configuration 2,...2, 
at time ż; Q% is the position, which the 7th particle would have 
had at time #, if it had continued to move with the constant initial 
impulse P'S) instead of taking part in the s-tuple collision. Both 


APPENDIX I. G. E. UHLENBECK 201 


P% and Q are therefore functions of 2,...%,, which can be 
found if the dynamical equations of the s-tuple collision can be 
integrated. Since F is a product of functions F}, one can say 
that in this approximation and in the sense of (12) the s points 
xı... z, are uncorrelated; it should be emphasized that this is only 
true in the first approximation; F® is no more a product of func- 
tions F}. Putting (12) for s = 2 in the equation (10) for A,(#,; F), 
one can prove that A, becomes the Boltzmann collision term, at 
least 1f in this collision term the difference of the positions of the 
two collision partners can be neglected. 

As I said, in this way one can goon. The next approximation 
in the kinetic equation A,(x%,; F,) depends on the triple collisions, 
etc. The virial development of the kinetic equations is therefore 
similar to the familiar virial development of the equilibrium prop- 
erties of the gas in the sense that the interaction of more and more 
molecules are taken into account. 


9. The Hydrodynamical Stage; Preliminary Results. In the 
hydrodynamical stage, the kinetic equation (8) is the basic equa- 
tion. Also now the contraction procedure is coupled with a power 
series development. The development parameter is now {,/0, or 
A/L; it measures therefore the relative change of the macroscopic 
quantities over a mean free path, and we may call it therefore the 
uniformity parameter u. Analogous to (8), we now develop the 
hydrodynamic equations (7) as follows: 


0 

= = „R, (q; p, u, T) + p? R,(q; p, u, T) +... 
(13) 

ðu 

y uU(q; p, u, T) + Uq; p, u, T) +... 


and a similar equation for əT/ðt. The development starts with the 
first power of u, since u = 0 corresponds to the equilibrium distri- 
bution in which all macroscopic variables are constant in time. 
Analogous to (9), we now develop (6): 


(14) F(a, p.t) = fo(a, p; p, u, T) + h(a, Pp; pu, T) +... 


202 PROBABILITY IN PHYSICAL SCIENCES 


Using the kinetic equation one then can determine successively 
the functionals R,, U,, O, and the successive approximations of the 
distribution function F}. 

If in the kinetic equation (8) one stops with the term pro- 
portional to 1/v, so that (8) becomes essentially the Boltzmann 
equation (only the streaming terms and the binary collision term 
are taken into account), then the contraction procedure in u is 
completely equivalent with the Enskog method for deriving the 
hydrodynamical equations. In this case one knows, that: 

1. in first order of u, one gets the ideal fluid or Euler equa- 
tions plus the ideal gas law for the equation of state. 

2. In second order of u, one gets the Stokes-Navier equations 
with explicit expressions for the viscosity and heat conduction 
coefficients in terms of the intermolecular potential. These coeffi- 
cients are independent of the density—one of the classical results 
of the kinetic theory of gases— and their values and temperature 
dependence are in very satisfactory agreement with experiment. 
For our model (monoatomic gases) there is only one viscosity 
coefficient; the v is equal to zero. The equation of state is still the 
ideal gas law. This is slightly inconsistent, since with binary 
interactions one would expect the second virial coefficient to come 
in. This is due to the replacement of 4; (x; F,) by the Boltzmann 
collision operator even for the case that F, depends appreciably on 
q. If one makes the proper allowance for this dependence, then 
as Bogoliubov has shown, one obtains the expected first deviation 
of the ideal gas law. 

3. In third order of u, one obtains the so-called Burnett equa- 
tions. They are very complicated and so far very few applications 
to specific problems have been made. In fact, in specific cases it 
is usually simpler not to use the Enskog expansion, but to start 
directly from the Boltzmann equation itself. However, the Enskog 
development is not in any way wrong; it only becomes quite com- 
plicated in the higher orders of u, and probably the expansion is 
not convergent, but asymptotic. 

If in the kinetic equation (8) the terms with A,, Á etc. are 
kept, so that triple and higher-order collisions are taken into 


APPENDIX I. G. E. UHLENBECK 203 


account, the situation becomes much more complicated. In first 
order of u, one would still expect to obtain the ideal fluid or Euler 
equations, but now with an equation of state containing as many 
virial coefficients as correspond to the 4,„(æx; F,) which are in- 
cluded in the kinetic equation. This has actually been proved 
recently by Mr. S. T. Choh?’, at least up to the terms proportional 
to 1/v3 (fourth virial coefficient); the general proof depends on, 
as yet unsolved, combinatorial problems. 

In second order of u, including in the kinetic equation the 
triple collision term (1/v?)A,(x; Fi), Mr. Choh has been able to 
show that one obtains again the Stokes-Navier equations. How- 
ever, now the viscosity and heat conduction coefficient depend on 
the density; in fact one obtains for these transport coefficients 
virial expansions of the form: 


u = HolT) + pay (T) +... 
A = A(T) + P(T) +. 


where u and A, are the Chapman-Enskog expressions (dependent 
on the binary collision cross-section), and m,, A, are the corrections 
due to triple collisions. They are complicated expressions involv- 
ing the dynamics of the interaction of three particles, and therefore 
it will be hard (but not impossible!) to obtain numerical results. 
It is finally of interest to note that there is now also a second 
viscosity coefficient, which in this approximation is proportional 
to the density. 


13 S. T. Choh, Dissertation, University of Michigan, 1958. 


APPENDIX II 


Quantum M echanics 


By A. R. HIBBs 


Professor Kac has described various types of probability 
measures. He has also explained that in the description of quan- 
tum mechanics used by Feynman there is an analog of probabil- 
ity measure which enables one to describe the behavior of quantum- 
mechanical particles. He has mentioned that this quantum- 
mechanical measure has some peculiarities. It is, for example, 
complex-valued. 

In this paper I will give some additional information about the 
quantum-mechanical measure, and, in particular, cover the follow- 
ing three topics. First, I will discuss why this peculiar complex- 
valued measure is necessary in describing the behavior of a physical 
quantity. Second, I will discuss what the measure is and just how 
it purports to describe the behavior of a physical system. Third, 
I will show two different types of mathematical manipulations. 
One of these manipulations enables one to integrate a thing called 
a path integral and thereby arrive at a more or less complete 
description of the motion of a quantum-mechanical particle. 
The second manipulation is actually a slight perversion of the 
quantum-mechanical measure which, nevertheless, permits one 
to solve still another type of quantum-mechanical problem. 

We begin then with a discussion of the reasoning behind the 
quantum-mechanical measure. Since this is a lecture in physics, 
it is only reasonable that we should start out with an experiment, 
an imaginary experiment perhaps, but nonetheless an experiment. 
Imagine that we have a source of electrons, the coil at the left of 
Fig. 1, from which electrons move toward the screen at (a), through 
the two holes A and B, then on toward an electron counter which 
can be moved up and down at the position of the screen at (b). 


[205] 


206 PROBABILITY IN PHYSICAL SCIENCES 


If electrons obeyed the laws of classical physics then we would 
expect the electron counter to respond (that is, to indicate the 
arrival of an electron) in only two positions. One of these posi- 
tions is determined by drawing a line from the source through the 


> 


(eo) 


(a) (b) 
Figure 1. 


hole A and on to the screen (b). The other one is obtained by 
drawing a line from the source through B and to the screen at (b). 
However, we are not too surprised to discover that this classical 
behavior is not followed by electrons. The electrons do not arrive 


P(A) P(B) 


Figure 2a. Figure 2b. 


at two distinct positions at (b), but instead have their arrival 
positions smeared out, or distributed. We find in fact that it is 
impossible to associate a unique arrival position with those initial 
conditions (at the source) or limiting conditions (the positions of 
the holes A and B) which determine the nature of our experiment. 


APPENDIX II. A. R. HIBBS 207 


Instead, the best we can do is associate with these conditions a 
probability for the arrival of an electron at a particular point along 
the screen at (b). 

For simplicity let us assume that all of the electrons which 
leave the source have the same energy. Now let us investigate the 
nature of the probability laws by closing off one or the other of the 
two holes A and B. If only A is open then the distribution of 
electrons observed at (b) is like that shown in Fig. 2a, while the 
distribution with only hole B open is like that shown in Fig. 2b. 

Now if we were to open both of the holes at the same time, we 
would expect, on the basis a natural assumption, that the resulting 
distribution is given by the sum of the distributions in Fig. 2a 
and Fig. 2b. However, if we actually perform the experiment, the 
distribution which we observe is like that shown in Fig. 3. 


P(A+B) 


Figure 3. 


This distribution, which has a character of an interference 
pattern, is certainly not equal to the sum of the distributions of 
hole A and hole B separately. That is, in terms of probabilities 


(1) P(A + B) # P(A) + P(B) 


Why is this? Why isit that our customary notions about proba- 
bility and the combination of probabilities do not seem to apply? 
We might guess that our assumption is wrong; namely, that pas- 
sage to hole A and B are not separate events but that perhaps the 
electron splits, part of it going through A and the other part going 
through B. However, we can check this idea by putting a source 
of light behind the holes and watching the electrons as they go 


208 PROBABILITY IN PHYSICAL SCIENCES 


through. The electrons scatter light, so that we can observe which 
hole is used by an electron by observing the point at which light is 
scattered. If the source of electrons is of sufficiently low intensity 
so that only one electron leaves at a time we find that light is 
scattered always behind either A or B, but never behind both 
simultaneously. Alternatively, we could actually measure the 
charge which passes through A or B, and we would find that it is a 
complete charge of one electron which passes through either hole, 
one at a time. 

As another possibility, we might imagine that the electron 
undergoes some peculiar sort of motion going through one hole first, 
then the other hole, then back through the first, and so on, before it 
finally gets to the collector at (6). Again, we can check this pos- 
sibility by illuminating the holes and watching the scattered light. 
We find that, for a source of electrons of sufficiently low intensity, 
that light is scattered behind only one hole each time an electron 
goes from the source to the counter. 

From these observations we are forced to draw the following 
conclusions. 

1. The motion of electrons must be described by some sort of 
probability law, since we can connect only a distribution of final 
positions to a set of initial conditions. 

2. The rules governing the combination of probabilities for 
alternate paths, or modes, of motion are not the same as the rules 
to which we are accustomed in ordinary probability theory. 

It turns out that we can describe the motion of electrons by 
introducing a new mathematical concept called the “probability 
amplitude.” We label the probability amplitude ¢, and associate 
an amplitude with each alternate method available to the electron 
for moving from the source to the counter. Thus in the present 
experiment we have two amplitudes, ¢,, and ġpg, one for each of 
the two holes. The rule for combining alternate paths is to add 
the amplitudes. Thus the total amplitude for an electron to pass 
from the source to the counter when both holes are open is 


(2) $(A + B) = $(A) + $(B) 


APPENDIX II. A. R. HIBBS 209 


Now, the probability for an event is postulated to be the 
square of the absolute value of the amplitude for the event. Thus, 
in general, 


(3) P = |ġ]? 


The amplitude, and thus the probability also, is a function of the 
initial condition, the final position, and of course, any constraints 
along the way, such as the position of the two holes A and B. 

Using the foregoing rules we can construct the probability 
for an electron to arrive at a particular point of the screen at (b). 
With both holes A and B open it is 


P(A + B) = |$(4 + B)? = |6(A) + (B)? 
(4) = |6(A)|? + |6(B)? + $(4)4*(B) + 6*(A)g(B) 
= P(A) + P(B) + 2Ri[$(4)$*(B)] 


In general, the real part of the product ¢(A)¢*(B) is not zero, so 
that we find 


(5) P(A + B) # P(A) + P(B) 


One might imagine, as we were discussing the basis for the 
concept of a probability amplitude, that we have introduced the 
possibility of a paradox. For suppose we put our light source 
behind a screen at (a) and watch to see at which hole light is 
scattered. We have already said that we can account for each 
electron separately going through one hole or the other in this way. 
Suppose we keep a tally sheet on which we record separately the 
electrons through hole A and those through hole B, together with 
their final positions on the screen (b). If we make up separate 
distribution curves for these two classes of electrons we will find 
the curves to be like those of Figs. 2a and 2b. But now, if we 
combine these curves we get a curve without any sort of inter- 
ference pattern. By all that’s reasonable, this must coincide with 
the observed distribution function of the combined classes. 

We can compare this combined curve, as taken from the tally 
sheets, with the observed distribution of all the electrons. If we 


210 PROBABILITY IN PHYSICAL SCIENCES 


do this we find that they coincide. The interference phenomena 
has disappeared! How can this be? In our previous experiment we 
observed the interference pattern of Fig. 3, whereas now we get the 
comparatively smooth distribution obtained by adding together 
Figs. 2a and 2b. The only difference in the experiments is that in 
the latter case we looked at the electrons as they came through the 
holes. 

So now we have discovered another peculiar fact about 
quantum-mechanical behavior. If you look at an experiment in 
the middle, so to speak, you change the outcome. We can under- 
stand physically why this happens. In working with electrons 
we are dealing with extremely small, extremely light particles. 
They are so light, in fact, that their motion can be profoundly 
disturbed by a collision with a quantum of light. Every light 
quantum carries with it its momentum, which is inversely propor- 
tional to its wavelength. In the interaction with an electron (and 
this is the interaction which produces the scattering that we ob- 
serve), there is an interchange of momentum between the light 
quantum and the electron. The momentum of the electron is 
sufficiently altered, and in a random way, so that the interference 
pattern in the resulting distribution function is smeared out. The 
peaks and valleys of the pattern are no longer delineated. 

Now it may occur to you that there is a way to beat this game. 
We might try to make the source of light so weak that the amount 
of momentum that it carries is too small to noticeably affect the 
motion of the electron. However, as I have already said, the 
momentum of a quantum of light does not depend upon the in- 
tensity of the source but rather on the wavelength of the light. 
Well then, we could make the wavelength quite long and thus 
reduce the momentum of each quantum. However, there is a 
limit to this. As is well known, it is impossible to resolve the 
position of the source of the light quantum more accurately than, 
roughly, a distance equal to a wavelength. This result, in fact, 
limits the resolving power of an optical microscope. 

In our experiment we wish to differentiate between hole A 
and hole B. Thus we must work with light whose wavelength is 


APPENDIX II. A. R. HIBBS 211 


less than the distance between these two holes. Now it just so 
happens that the momentum carried by light of this wavelength 
is just enough to effectively smear out the interference pattern of 
the distribution. We conclude that we cannot—at least with this 
technique—tell which electron goes through which hole while still 
producing our interesting interference pattern. Nature has out- 
witted us again. 

As a matter of fact, there is no known experimental technique 
which will enable us to look at an experiment while it is in progress 
without disturbing the results. For ordinary pieces of matter, such 
as notebooks and pencils, the disturbance of observation is so slight 
as to be completely unnoticeable in our ordinary experience. It is 
only when we work with these extremely minute particles that the 
result becomes important. In this way we come to the rather 
disturbing conclusion that, for the experiment we are discussing, 
when we produce the distribution function of Fig. 3 we are not 
allowed to look to see through which hole the electrons come. 
Sometimes physicists will say that we cannot even assert they 
came through either hole, since only those things which are ob- 
served can be asserted. The most that we can say is that each 
electron has an amplitude to go through each hole, and these 
amplitudes interfere to produce the resulting distribution function. 

This limitation on experimental techniques is the Heisenberg 
uncertainty principle. We can state the principle in a number of 
ways. The traditional statement concerns the uncertainty involved 
in the simultaneous measurement of both the position and the 
momentum of a particle. The statement is that in any simulta- 
neous measurement of both position and momentum the product 
of the uncertainties in the measurement of these two quantities 
must be of the order of, or greater than, Planck’s constant of action 
h divided by 2x. (This combination, Planck’s constant divided 
by 2x, is usually written #.) However, there is another way to 
state the principle which is more in line with the discussion we 
have been carrying out here. That is the following. As soon as 
an observation is made on the motion of a particle, all the other 
alternative motions are immediately cancelled out. That is, if we 


212 PROBABILITY IN PHYSICAL SCIENCES 


observe the electron to go through hole A, then it is no longer valid 
to include in our calculations an amplitude for the electron passing 
through hole B. We know that it did not. Thus every observa- 
tion limits the available alternatives for the motion of the particle. 
These two statements of the uncertainty principle are actually 
equivalent. 

In interpreting quantum-mechanical behavior we have assert- 
ed that the outcome of an experiment can only be known probabil- 
istically. We have introduced a concept of a probability ampli- 
tude whose absolute square gives a probability or a distribution 
function. There is a very interesting question as to whether or not 
this probabilistic interpretation of quantum-mechanical behavior 
is a unique interpretation. Is there perhaps some more determi- 
nistic interpretation of the observed behavior? So far it would 
appear that the probability point of view is the only satisfactory 
one. However, this conclusion has certainly not been proved. 

Although this question is an interesting one, it is not, perhaps, 
of very great importance. The probabilistic interpretation offers 
the physicist an adequate explanation of the phenomena he ob- 
observes. At the present time, it is of more importance to under- 
stand the peculiar behavior of nuclear matter. It is interesting to 
speculate on what new concepts we will have to adopt in order to 
understand the particles which we are producing in our high-energy 
accelerators. 

Now we will look at the form of the probability amplitude 4. 
Actually, it has a very simple form given by 
(6) p ~ exp (0S/h) 
where S is the action associated with the path described by ¢. 
That is 
(7) S= fL (è, x, t)dt 


where L is the Lagrangian function given by the kinetic energy 
minus the potential energy, and the time integral is taken along 
the path in question between the initial and final times of the 
experiment. The path itself is just the trajectory of the particle. 
We can construct it, if we like, by generalizing our imaginary 


APPENDIX II. A. R. HIBBS 213 


experiment. Suppose we put very many screens between the 
source and the collector and in each screen drill several holes. 
Then, by writing down the sequence of holes through which an 
electron passes and the time at which it passes through each hole, 
we approach a complete description of the path. We finally con- 
clude this description when, in the limit, we place an infinite num- 
ber of screens between the source and the collector and in each 
screen drill an infinite number of holes, so that, in fact, there are 
no screens left at all, and all we have is the path itself, position as 
a function of time. Then the total amplitude to go from the source 
to the collector is the sum, taken over all the infinite possible paths, 
of the amplitude for each path. Simplifying to one dimension in 
both space and time, we can write the amplitude to go from the 
point z, at the time /, to the point x, at the time ż, as 


(8) K (a, ty3 4, 44) =f “exp [amf ; Ldt] De) 


Here 2 symbolizes the fact that we are carrying out an operation 
of integrating over paths. 

There is an operational definition for path integrals which can 
be used to actually evaluate the integrals. It is constructed in 
analogy with an ordinary Reimann integral. We imagine a path 


Figure 4. 


to be divided into a series of discrete time steps. We label the 
points at which a path passes through each time division as 
Lis Bos e s Li e. aS Shown in Fig. 4. 


214 PROBABILITY IN PHYSICAL SCIENCES 


In between the division points of the path we imagine the 
actual path is replaced by segments of a classical path. That is, 
between the division points we imagine the particle to be moving 
like a classical particle under the influence of the potential field 
existing at that point of space and time and constrained to pass 
through the end points v, at t; and x; at tı. Thus over this 
segment of a path the action is 


(9) a Ladt x eL | (x; — tia) (£; + £ia)/2, (t + ti1)/2] 


and for any one path the action over the whole path is approxi- 
mately 


N 
(10) oa 2 FL (es — X,4)/e, (Er + %,4)/2, (¢ + t,4)/2] 


Now the complete amplitude for the motion of a particle from its 
initial point a to its final point b is obtained by integrating the 
division points over the complete range of x. That is 


N 
(11) K(b,a)~lim |". ae exp | i/f S Led | day... dey, 
e>o* os i=1 

where the initial and final points, x) and wy are fixed and equal to 
x, and x,, and the number of divisions N = (t, — #,)/e. For La- 
grangians which are quadratic in the time derivatives of x all the 
integrals turn out to be Gaussian and thus can be worked. How- 
ever, in passing to the limit of small ¢, we discover to our dismay 
that the function defined in equation (11) diverges. Evidently 
some sort of normalizing factor is required. 

In the definition of the ordinary Reimann integral we add the 
ordinates at a discrete set of points and take the limit as the num- 
ber of points becomes infinite. However, we must multiply this 
sum by the distance between points in order to avoid divergence. 
For the path integral we must multiply the multiple integral by 
(2achte/m)—-N??, 

You may have noticed that in writing out this definition of a 
path integral we have tacitly made use of the idea that the paths 
can be divided in time and the results added up sequentially. 


APPENDIX II. A. R. HIBBS 215 


That is, we have made use of the principle that the amplitude for 
events occurring in succession in time is given by the product of 
amplitudes for each of the two events. The equation is 


(12) K(b, a) = |7 K(b, c) K(c, a) dz, 


This result is valid whenever the action itself can be separated: that 
is, whenever the action along a path from one point to another can 
be written as the sum of the action from the initial point to some 
point in the middle, plus the action from the middle point on to 
the end. Since the action enters into the exponent of the ampli- 
tude, we can see how the product formula results. 

In relativistic quantum mechanics this result is not so imme- 
diate. Here the motion of an electron is determined by the field 
of photons through which it moves. Furthermore, the electron 
keeps discharging photons, then recollecting them somewhere 
further along its path. Thus, at any instant of time a complete 
description of the system requires not only a specification of the 
position and momentum of the electron but also the condition of 
the complete electromagnetic field which surrounds the electron, 
and this in turn depends upon the previous motion of the electron. 

Working with the path integral approach to quantum me- 
chanics it is easy to understand the classical description of the 
motion of large masses. Suppose, for example, we are considering 
the motion of a mass the size of a billard ball. It has an amplitude 
for motion along each possible path from its initial to its final 
point, and each amplitude is proportional to the exponential of the 
action multiplied by z and divided by #. Now if the mass is very 
large the action will be very large compared to #. Any slight 
variation in path will mean a very large variation in the ratio of 
action to %, and the amplitude will oscillate wildly between +1 
and —1. Thus, in adding together the amplitudes over small 
variations in path, the amplitudes will cancel each other out. The 
result will be zero except in the neighborhood of one particular 
path. Along this special path the action is an extremum. That 
is, small variations in path will produce no change whatsoever in 
the action. In this region the amplitudes summed over paths 


216 PROBABILITY IN PHYSICAL SCIENCES 


will interfere constructively to produce a nonzero result. Thus, 
only along a path for which the action is an extremum will the 
amplitude differ appreciably from zero. But this is exactly the 
classical result, namely that a body follows the path of least action. 

You may have heard from time to time about a quantity 
called a wave function which is a solution to Schrédinger’s wave 
equation. It turns out that the amplitude which we wrote down 
in equation (8) is itself a wave function of a special type and is a 
solution to Schrédinger’s wave equation. 

The equation which is perhaps the most important mathema- 
tical statement in the world today, has in the succeeding years 
been ‘reduced to 


(13) Ey = Hy 


This is an operator equation whose apparent simplicity is some- 
what deceiving. £ is the energy operator and H the Hamiltonian 
operator. For a particle moving in one dimension and subjected 
to a potential V (x,t) the equation is 

h op Ke oyp 
(14) ma o 
Its solution are the wave functions y(x, t) defined as functions of 
one position in space and time. Such a wave function is actually 
a special form of the amplitude defined in equation (8). It isa 
form in which the initial point x,, t, has been omitted from the 
definition, and only the final point remains. Such amplitudes are 
useful in those cases where the past history of the particle is un- 
important, and all we need to know is some function describing 
the particle here and now. 

The proof that the probability amplitude, defined by path 
integrals, satisfies Schrdédinger’s equation is obtained by investigat- 
ing the development of the amplitude over a very short interval 
in time and short distance in space. Thus, using the product rule 
for combining amplitudes of events occurring in succession in time 
we write 


(15) K (eto te;%y,4)=[~ K (tyt +8; t,t) K (£98, tuth )dE 


APPENDIX II. A. R. HIBBS 217 


The first term in the integrand can be written as 
A~! exp [tmé?/(2eh) — (te/h)V (ao, ta)] 

to first order in e and second order in &. The second term can be 
expanded as a Taylor series in £, and then the integration over £ 
can be carried out for all the necessary terms. 

The left-hand side of equation (15) is expanded as a Taylor 
series in ¢, then terms of the various orders in € are equated. 

Equating terms of zero order shows that the normalizing 
constant which we introduced to insure the convergence of equa- 
tion (11) is, indeed, given by 


(17) A = +/ (2athe)/m 


Equating terms of first order in e yields Schrédinger’s wave equa- 
tion, where derivatives are carried out with respect to the end 
point variables. (A few changes in sign will give an equation valid 
for derivatives with respect to the initial point variables.) 

The operational definition of a path integral as given by 
equation (11) is actually rather cumbersome for practical use. 
Fortunately there is another technique available which operates 
quite well for most quantum-mechanical problems. We make a 
substitution of variables letting x = + y, where « is a path 
which goes through the prescribed end points and along which the 
action S isan extremum. Then y takes on the value zero at both 
end points. Now we make this substitution of variables into the 
Lagrangian L, and consider the integral of this Lagrangian along 
a particular path. This integral is, of course, the action along the 
path, and, since # is defined as the path along which S is an extre- 
mum, we know that all terms containing y to the first power must 
vanish in the integration. Therefore, in those particular cases in 
which L is quadratic in the spacial variables, there can be no cross 
product terms of the form žy surviving the integration, and as a 
result we find that the action along any path can be written as the 
sum of two actions; thus 


(18) S[% + y] = Salt] + S'[y] 
Here, Sa is the action along a path which would be followed by a 


218 PROBABILITY IN PHYSICAL SCIENCES 


particle moving in accordance with classical laws, that is, the 
classical action, and S’ is an additional action which depends only 
upon the path variation y. 

Since the path ž is a fixed path, only y can be varied as we 
sum over all paths. Thus we can write the path integral as 


K(b, a) = f exp | i/h f Lat | Du(t) 
(19) = f exp (iK [Se + S']) Zyl) 


= exp (i/%Sq) [f exp (i/%5’[y]) Dy 


Here we see that the path integral goes over paths whose end points 
are both zero. The function giving the dependence of the ampli- 
tude upon the end points a and b has been pulled outside the path 
integral. Furthermore, it is easy to show that if the Lagrangian 
does not contain the time explicitly, then the remaining path 
integral can depend only upon the time interval and not upon the 
absolute value of the times at the end points. 

Of course, in the form of equation (19) we still have to eval- 
uate a path integral; however, for a large number of important and 
interesting problems the important physical facts are contained 
in the factor in front of the integral, and the path integral acts as 
little more than a normalizing constant. In fact, in many cases, 
the value of this path integral can actually be obtained by solving 
for the normalizing constant of the probability distribution, at 
least to within an unknown but unimportant phase factor (i.e., 
exp (76) where 6 is the constant phase factor). In other cases the 
remaining path integral can sometimes be evaluated by considering 
a special case of the problem which happens to coincide with a 
previously worked example. For instance, in solving for a harmon- 
ic oscillator, the nature of this path integral can be determined 
by considering the special case in which the frequency of the 
oscillator is zero, that is, the special case in which the harmonic 
oscillator coincides with the free particle. If, after all, no such 
trick is available, it is always possible to return to the operational 
definition of equation (11). 


APPENDIX II. A. R. HIBBS 219 


In the remainder of this paper I will give a brief description 
of a method of using the path integral approach to determine the 
lowest energy state of a quantum-mechanical system. There are 
already many techniques for carrying out this calculation. Prob- 
ably the most famous of these is the Rayleigh-Ritz variational 
method. First of all, suppose we define the energy states of a 
system. When such states exist we find that Schrédinger’s equa- 
tion 


h oy 
20 aie eee 5 
eo) i oo” 
has eigenfunctions ¢, and eigenvalues £,. Thus we can write 
(21) En bn = Ady, 


Any solution of Schrédinger’s equation can be written as a 
sum of eigenfunctions. 


(22) p = È enhn exp (— rE, t/h) 


Now suppose we make the substitution +7¢ = t throughout 
the problem. We will still have the same eigenfunctions and 
eigenvalues, and any solution of Schrédinger’s equation will still 
be written in the same form. But now this form becomes 


(23) p = > ¢,¢, exp (— E,1/h) 


If we take the limit of this expression as t becomes very large (say, 
equal to T), then to an increasingly good approximation we have 


(24) y = Cody exp (— Egl [h) 


Now we know that any solution of Schrddinger’s equation 
can be expressed as a path integral. Thus we can write 


(25) y= “exp (S/fi) Du(r) 


and if we solve this path integral for very large values of T the 
result will be proportional to exp (— E,T/h). 
The solution of this path integral can be very difficult if S is a 


220 PROBABILITY IN PHYSICAL SCIENCES 


complicated function. The procedure we will adopt gives us a good 
method of estimating the value of the path integral with an arti- 
ficially constructed form for the action. We will call this new form 
S,, and we will try to choose it so that it is as similar to the original 
S as possible and yet will yield a soluble path integral. 

We will write the path integral as 


(26) y= [exp (S/h) e(r) = [exp [(S—S,)/B] exp (S,/) 2x(r) 


and we will define the “average” value of a functional F as 


(27) (Fy = Í F exp (S/h) Dx(x)/f exp (S/ř) Bx(r) 


Next, we call upon an important and well-known lemma. Namely, 
that for any positive weighting function used to define an average, 
the following inequality holds 


(28) (exp F> = exp (FP) 

Using the definition of a functional average contained in 
equation (27) we will define the energy E by the relation 
(29) [exp (1/#<S — S,>) exp (S,/4) 2x(r) ~ exp (— ETJ) 
Now, by virtue of the lemma we know that 
(30) E 2 Ej 


Thus, no matter what our choice for S, the value obtained for E 
will always overestimate the true lowest energy state E. The 
smaller the value we get for E the better our estimate for F,. In 
order to evaluate E we must solve the two path integrals 


7 J? exp (Sy/h) De(z) = exp (— E,T/h) 
K (S — S,) exp (S1/#) 2x (t) = <S — S> N exp (S,/h) Gx(t) 


For clever choices of S,, these two path integrals can be solved 
even though the path integral of equation (25) cannot. 
We can improve this technique somewhat by including in our 


APPENDIX II. A. R. HIBBS 221 


choice of S} some undefined parameters. Then, when E has been 
evaluated in terms of these parameters, we can vary them to find 
the minimum value of E, and thus improve our estimate of £5. 
In this paper I have attempted to follow up Professor Kac’s 
description of various kinds of probability measures with an ex- 
planation of the peculiar measure used in quantum mechanics, 
together with a description of the rather novel probability con- 
ventions which apparently must be adopted in order to explain the 
behavior of nature’s fundamental particles. I have shown you 
how this measure and the associated theory can be manipulated, 
not only to solve the problems which arise in the study of quantum 
mechanics, but also to deepen our understanding of fundamental 
physical laws. This path integral approach has not produced any 
startling innovations in the field of quantum mechanics. How- 
ever, in many cases (for instance, the last example that I described) 
the technique can produce better answers with much less labor. 
But perhaps the most important contribution of the path-integral 
approach is the way it assists our intuitive understanding of nature. 
For references for this appendix see R. Feynman, “The Con- 
cept of Probability in Quantum Mechanics,” Proceedings of the 
Second Berkeley Symposium on Mathematical Statistics and Prob- 
ability, p. 533; “The Space-Time Approach to Non-Relativistic 
Quantum Mechanics,” Rev. Modern Phys., 20, 367, 1948; “Slow 
Electrons in a Polar Crystal,” Phys. Rev., 97, 660, 1955. 


APPENDIX III 


Smoothing and “Unsmoothing”’ 
By BALTH. VAN DER POL 


1. The physicist, the astronomer, the economist, or any other 
investigator who makes frequent use of graphical representations 
of rapidly and especially erratically varying functions, often feels 
the need for “smoothing” his observations, because the “raw data” 
furnished by his observations are often too irregular to be useful 
for further analysis or study. 

This smoothing can be done in different ways. For example, 
it can be done (and occasionally is done) by “hand” or by “eye.” 
The latter procedure is perhaps more of an artistic than of a 
scientific nature. 

In this paper we will be concerned with a perfectly well- 
defined way of smoothing. It is variously called the moving aver- 
age, or the sliding mean. We will stick here to the first nomencla- 
ture. 

The moving average g*(x) of a given single-valued real func- 
tion g(x) of one variable x (say the price of pig-iron as a function 
of time) is experimentally obtained as follows. 

We first plot the original function g(x). It may show great 
irregularities. Next we decide over what total length a we shall 
smooth. On the graph we then put two pieces of paper with 
straight vertical edges a distance a apart, so that only a vertical 
strip of the original drawing remains visible. We mark the 
arithmetical average A of the visible part of the original curve; 
then we move the two pieces of paper a little to the right, keeping 
their distance a constant and again mark the average B of the then 
visible part of the original curve, and so on. The curve g*(z), 
obtained by joining the points A, B,... is the moving average 
g*(x) of the original function g(x). 


[223] 


224 PROBABILITY IN PHYSICAL SCIENCES 


The procedure described shows that the moving average 
g* (x) for any value of x is defined as the mean value g(x) on the 
interval between x — 4a and x + da of the total length a. Analy- 
tically it is therefore represented by: 


1 (2t+3e 1 (+29 
a) et) ==] d= S ee + ae 
AJ x—ha aJ—ha 

It will be obvious that the moving average in practice might 
be much smoother than the original function. 

A typical case of a moving average is obtained when we re- 
produce a sound film. The blackening on the film represents the 
original function g(x). When we reproduce the sound film by let- 
ting a photocell catch the light of a homogeneously illuminated slit 
of a finite width behind which the film passes, the function g* (zx) 
thus obtained will be the moving average of g(x) and thus in fact 
the function g*(x) will be heard rather then g(x) itself. 


2. The latter, technical, procedure at once raises the fundamental 
question: given the moving average g* (x) (together with the interval 
a), can the original function g(x) be retraced uniquely, and, if so, 
under what circumstances? The problem can therefore be formulat- 
ed as the “unsmoothing”’ of smoothed functions. 

At the outset it will be clear that there certainly are functions 
such that their moving average g* (x) will not enable us to retrace 
g(x). If, e.g., g(x) is of the form 


g(x) = A sin (27a/a + ¢) 


then one complete oscillation of the function just fits the width of 
the slit. In this example therefore g*(z) = 0. But the same 
vanishing of g*(%) would be obtained for any periodic function 
g(x) having a period a or a submultiple thereof. Hence in general 
our fundamental question cannot be answered in the affirmitive. 

Clearly our problem concerns the solution of equation (1), 
considered as an integral equation where g*(x) is supposed to be 
given, and it is required to find g(x). 

A convenient and elegant way of tackling our problem is with 


APPENDIX III. BALTH. VAN DER POL 225 


the help of the operational calculus, when we base the latter on 
the two-sided Laplace transform. 

For this purpose we will thereby use the following notation. 
Given an “original” function A(x), we construct its “image” f(p) 
through the integral 


(2) fe) =p | exp (— pa) h(x) da 


provided there exists a strip « < Re # < fin the complex plane in 
which the latter integral converges. We then write for short 


f) =h) or hæ) =f) («< Rep <8) 
From (2) we can immediately conclude the “shift rule.” If 
A(x) =F) a< Rep< $, 
then 
(3) h(x + A) = exp (på) f(A) 


in the same strip, 2 having any real value. 
Let us go back to (1), and let us assume that the operational 
image of g(x) is f(p), or 


g(x) = f(p) 
and that similarly the operational image of g*(x) if f*(p), or 
g* (x) = (p) 


Then if we transpose (1) into the #-language, we obtain 


(4) ftp) =<] exp (PE) 116) aC = (sinh $a) (pa) Hp) 


Hence (4) tells us that the image /*(#) of the smoothed function 
g* (x) is obtained by simply multiplying the image /(p) of the un- 
smoothed function g(x), by the factor (sinh }pa)/(4pa) or 

(5) f*(p) = (sinh $pa)/ ($a) f(A) 

The identity (5) expresses /*(p) in terms of f(p). But it at once 


226 PROBABILITY IN PHYSICAL SCIENCES 


enables us to solve for /(#), because by simple division we obtain 


(6) f(b) = (2pa)/ (sinh 3pa) f* (p) 

In the operational p-language, (6) is the solution of our prob- 
lem because it gives the image f ($) of the unknown original g(x) in 
terms of the image f* (p) of the known, smoothed, function g* (æ). 
However, the problem remains to interpret (6) in the language of 
the original variable x. This can be done in many different ways 
leading to as many procedures to retrace the original g(x) from the 
given smoothed function g* (x). As is to be expected, questions of 
convergence present themselves. 


3. Let us make the very physical assumption that our un- 
smoothed function g(a) is cut off on both sides, i.e., it is zero out- 
side a finite interval. Then the limits of the “definition”’ integral 
(2) for {(p) may be replaced by finite values. Under these assump- 
tions (and in the absence of poles on the real axis) (2) will con- 
verge for any value of p. We then are free to assume Re # to be 
positive. If, for simplification, we further take a = 1, we can 
write in (6) 


(7) ($p)/(smh 3p) = 4% (exp (gp) — exp (— gA))™ 
= exp (— 4%) + exp (—$p) + exp (— 34) +... 


Interpreting the factor p as d/dx, the original of (6) can, with the 
help of (7) then be written as 


(8) g(x) = dide {g* (x — 3) + g* (æ — $) + e*(@@— 3) +..3} 


Since together with g(x) g*(x) also will certainly be cut off at both 
sides the series, (8) automatically terminates, and (8) gives us the 
exact solution of our problem of retracing g(x) from the given 
smoothed function g* (æ). 

The physical sense of (8) can best be illustrated with the help 
of the aforementioned sound film g(x) which we consider repro- 
duced and rerecorded through a slit of final width a, leading to a 
new recorded film g*(x) representing the moving average of g(x). 
In order to retrace g(x), the expression (8) tells us that we have to 


APPENDIX III. BALTH. VAN DER POL 227 


construct a long metal plate which can completely cover the film 
g* (æ) and have to cut infinitely thin slits in this plate a distance a 
(i.e., the original slit width) apart. We illuminate the whole metal 
plate, while moving the smoothed film underneath, collecting the 
totality of all the light passing through these many infinitely thin 
slits. If we differentiate the resulting current from the photocell 
and send this through a loudspeaker the sound will correspond to 
the undistorted original film g(x). 


4. Returning to (6), we notice that the operator (4pa)/(sinh $a) 
may also be written as 


pa(exp (zba) —exp (— zpa))™ = exp (zba): pa: (exp (pa) —1)-! 


9 oe) By k 
" = exp (Apa) 3 ECL, pal < 2x 


where B, are the Bernoullian Numbers. The expression (9) may 
be considered as their definition; in fact, in somewhat simpler form, 
we have the generating function 


(10) i(expt?—1)7=) jt] < 2x 

The application of (9) to (6), remembering that the factor 
exp (4pa) simply represents a shift over a/2 in the x field, now 
yields the solution of our integral equation (1) in the form 


a1) gle) m E EE (a) ete + ba 


In (11) we used the sign for asymptotic equality, since, in view of 
the limiting condition |pa| < 2x, in general convergence of (11) 
may not be expected. However, if g*(~ + 4a) happens to have the 
form of a polynomial, then (11) breaks off and equality is guaran- 
teed. 

The preceding expression (10) defining the Bernoullian num- 
bers may be generalized so as to define the Bernoullian polynomials, 
which, in connection with our problem of “‘unsmoothing”’ play a 


228 PROBABILITY IN PHYSICAL SCIENCES 


very particular role. In fact we have the generating function 
tr 


(12) t exp (tx) (exp t — 1)! = > B, (x) ar 


where B,,(x) are the Bernoullian polynomials. From (10) and (12) 
it follows that they can be written as 


B, (x) = > "C, Bx" 
k=0 
They are therefore polynomials with coefficients determined by 
Bernoullian numbers. 
Now it can be shown (we omit the proof) that the following 
important operational image exists: 


Rep > 0 
n=l 


(18) p-*-plexpp—l) +e: —{B, (z) — By (e— [2])} U (e), 


The right-hand side of (13) consists of two terms. First the 
function B,(%)U (x) which for positive x coincides with the Ber- 
noullian polynomial B, (x) but is zero for negative x; and further 
the term B,(x — [x])-U(x) which in the interval 0< z< 1 
represents the Bernoullian polynomial 5, (xz), but which is further 
periodically extended to greater values of x. We define 


l, @> 0) 


i 
Ol i eck 


We now wish to study the moving average of the function on the 
right of (13). According to (5), (if we choose a = 1), the image of 
the moving average is obtained by multiplying the image of the 
original by 


(sinh $)/(3b) = [exp ($6) — exp (— 3)] #7 
= exp (— 4b) (exp p — 1) p> 
If therefore we disregard for the moment the “shifting term” 


exp (— 4p), the image of the moving average of the right side of 
(13) becomes 


(14) 


APPENDIX III. BALTH. VAN DER POL 229 


(15) pb" p(exp p — 1)*: (exp $ — 1) pt =p" 


Now the well-known original of #-" is æx”/n!U (x), (n> — 1). 
Hence, after shifting x to x + 4, we find 


fr B,(¢) ae —{" B,( — [¢]) d =a", ("> 0) 


The second integral cancels because it represents a moving average 
over a periodic function, the period of which just “fits” the width 
of the slit, and thus we have proved the important relation 


(16) [P" By (0) ae = 2 


to be true for x > 0, and by analytical continuation it is therefore 
true for all values of x. The relation (16) can be expressed as 
follows: The (asymmetric) moving average (integral limits x and 
x + linstead of x — $and x + 4) of any Bernoullian polynomial 
B,,(x) of integral order n 1s simply x". Hence in this case the 
moving average has a simpler form than the original function. 
In fact, inversely, (16) constitutes what is perhaps the most direct 
definition of the Bernoullian polynomial B, (xz) as being that 
polynomial of degree n whose moving average is x”. Together 
with the polynomial itself, (16) therefore also completely defines 
the Bernoullian numbers by, Bi, Bas.. B,. 


5. In this paragraph we wish to draw attention to a geometrical 
interpretation of the Bernoullian numbers relating them at the 
same time to the rational prime numbers 2, 3, 5, 7, 11, 13,... 
We therefore consider again (10) 


(10) t(expi— 1) 7 = y= t| < 2x 
0 ° 


Next in (10) replace ¢ by —#. Taking half the sum, we obtain 


(10a) di(exp? + l)(expt— 1)? = 5 


Finally in (10a) we replace ¢ by tiġ, and obtain 


230 PROBABILITY IN PHYSICAL SCIENCES 


Go) (B¢)/(tan $4) =È = 


The expression (10b) thus represents the ratio of the angle 4¢, 
itself to its tangent or the ratio of an arc to its tangent. 

The numerical values of the first few rational numbers B 
are as follows: 
1 


bo 


ey B; — 35 
B, = —5 By = 0 
B, =4 Bio =e 
B, = 0 By = 0 

= 1 oe 6 
By = — 30 Big 5730 
B i 
B 


1 & 
| 
Of 


They have a remarkable property, discovered almost simultane- 
ously by two astronomers over a century ago:! they can be ex- 
pressed in the form 
(11) Ban = Gen — > pm 
p—1/2n 
The interpretation of (11) is as follows: The Bernoullian number 
B,, equals an integer G,,, (positive or negative or zero) minus the 
sum of the reciprocals of all those prime numbers # which have the 
property that p — 1 is a divisor of 2n. As an example we take 
2n = 12. In order to apply (11) we must therefore first determine 
those prime numbers # for which p — 1 is a divisor of 12. All the 
prime numbers which have this property are $ = 2, 3, 5, 7, 13. 
Hence 
342] 

Diz = — Gy, — org ge ae) CE aa, 

In this case Gy. = 1 (there is no general rule to determine the in- 


1 Von Staudt, J. reine angew. Math. 21, 372-374 (1840). Th. Clausen, 
Astronomische Nachrichten, 17, 352-352 (1840). 


APPENDIX III. BALTH. VAN DER POL 231 


tegers G,,) so that 
B, = 1 — 3421/2730 = — 691/2730 
The fact that properties of prime numbers enter into the Mac- 


Laurin expansion of the simple geometrical ratio of an arc and its 
tangent, is most remarkable indeed. 


6. Returning to the general smoothing problem, we remark that 
the process of constructing the moving average g*(x) of a given 
function g(x) may be reiterated so as to obtain g** (a), g***(x),..., 
defined by 


pers) = *= [58 ža *(x4 + ¢)dč, etc. 
Clearly the corresponding operational image will be 


(11) fip ©” = [(sinh pa)/(3pa)]"7(b) = gia *” 
The “unsmoothing” can in this case be effectuated by reiteration of 
the procedures described above. 
Taking a = 1, the image in (11) may be written as: 
(12) fig" =p (2 sinh gpa)" p0) f(b) = pA, (6) - FP) say, 


where 4,(6) = (2 sinh 4)” p7". 
The original of (12) is easily constructed: 


We have 
gi 
ae = mI U (x) (Re $ > 0) 
and hence 
(13) (2 sinh 3A)" = an (E UO) = hle), say, 


where 
A,o(x) = (x + 3) — ple — 9) 


so that A, is the symmetric difference operator. For increasing n 
the function h, (x) becomes smoother and smoother, while the area 
under its curve remains equal to unity for any n. 


232 PROBABILITY IN PHYSICAL SCIENCES 


From (13) it may further be concluded that, for large n, the 
image of +/nh, (x1/n) becomes 
(14) p[(2 sinh $p//n)/($p/4/n)]” = p + £7/(24n) + ...)” 
a p exp (p?/24) 


The original of the last expression is known to be 1/ (6/x)exp(— 6x?) 
from which it follows that, for a large n 


(15) nh, (æn) œ y (6/2) exp (— 6x?) 
or 
(15a) h, (x) x +/[6/(2n)] exp (— 62?/n), n> 


Hence, instead of applying the process of averaging n times, say 
with the help of sound films reproduced with a finite slit, we can in 
one step arrive at the same result if we illuminate the slit with a 
light intensity given by h,,(x), in (13). It is very striking, as shown 
by (14) and (15), that if n is taken sufficiently large, this illumina- 
tion tends to the Gaussian error function. 


7. So far we discussed the moving average of a function g(x) of 
one variable x only. The same fundamental idea both with respect 
to smoothing and “‘unsmoothing’’ may equally well be applied in 
two dimensions to say a photograph, a television image, a telescope 
image, etc. Due to diffraction or finite spot size the images as 
practically obtained represent approximately a moving average of 
the ideal image. It seems not excluded that with the help of the 
procedures of “‘unsmoothing,” outlined above, or extensions there- 
of, improved images may be obtained. To what extent back- 
ground noise would limit the applicability of the ““unsmoothing”’ 
methods will probably depend on the individual cases considered. 

An exact mathematical generalization to two dimensions of 
the above described one-dimensional smoothing procedure would 
consist in replacing the original two-dimensional function (say a 
photo) by another one, where the local blackening is determined by 
the mean value of the blackening of the original over a concentric 
circle or over a square. 


APPENDIX III. BALTH. VAN DER POL 233 


As is often the case in potential and wave problems, also here 
we find that the solutions are somewhat more complicated in a 
space of an even number of dimensions then when we are con- 
cerned with an odd number. 

We will therefore in the following paragraph give a short 
outline of our smoothing and “‘unsmoothing”’ problem in three 
dimensions. 


8. Suppose we are given any function g(x,y, z). Its three- 
dimensional moving average g*(x, y, z) is obtained by replacing 
the value of g(x, y, z) at each point (x, y, z) by the mean value over 
the volume of a sphere of radius a and with the point (a, y,z) as 
center. Analytically this amounts to the following definition of 


g* (x, y, 2) 
] a 7 27 
(16) g* (ax, y, z) = ame, r? ar | sind db | dd g(x + rsin ð cos ¢, 
y + v sin 6 sin ¢, z + 7 cos 0) 


We now transform (16) with the help of a three-dimensional 
simultaneous operational calculus, based on the threefold integral 


+0 +00 +00 
pi bo Ps | > exp(—prz) |" exp(—fay) |" exp(—pa2)g (@y,2)dedy dz 
which latter relation we write for short as 


(17) fbi De, Ps) = g(x, y, z), dy < Re py < Êr k=l, 2,3 
If, as before in the one-dimensional case, we assume the smoothed 
function g* (xz, y, z) to have the image f* (p1, P2, P3) (16) becomes: 


l a m 27 
(18) PL Pe Pa) = ae f rtar[ sinoaof ag 


‘exp (fp, 7 sin 0 cos ¢@ + pa 7 sin 0 sin ¢d + har cos 0) - (pi de, P3) 


The threefold integral, (18), is well known in wave theory. In fact, 

(18) can be written as 

sinh Vr 
Vr 


3 a 
I (bss bas ba) =| ear ENT Hib papa); V= (6+ 23 + ODE 


234 PROBABILITY IN PHYSICAL SCIENCES 


so that (19) becomes 


sinh aV 


(20)  f*(A1, be, P3) = ony [cosh aV — | bs Pe Pr) 


giving the image of the smoothed function g* (a, y, z) in terms of 
the image of the unsmoothed function g(a, y, z) with the help of 
the “operator” Q, where 


3 sinh aV 
(21) Q (bi, Pa, P3) = (avy yj? (cosh aV — — 


With the latter interpretation we are able at once to translate (20) 
into the (a, y, z)-language: 


3 sinh aV 
(22) g* (2, y, 2) = (ave [cosh aV — | ea, Y, 2) 
where now 
(220) N P 92 P ai 
«Léa? Oy? 02? 


Let us investigate somewhat closer the physical meaning of (22). 
For this end we note that, for small a (22) becomes 


. a2V2 atV4 aye 
ee ee eee oes S 
Laa ees?) ( 10 280 2520 )e (x, y, 2) 


where 
g2 g? n 
Oa? + oy? = Oz? 


Hence, when the smoothing is done over an infinitely small sphere, 
the smoothed function equals the unsmoothed function, as it 
should. 

But (23) also shows us that, if our original function g(a, y, z) 
has an infinite number of derivatives (and (23) can therefore be 
applied), and if moreover Ag(a, y, z) = 0, or in other words, if our 
original function satisfies the potential equation, then again 
g* (x, y, 2) = g(x, y, z) and the smoothed function equals the un- 


APPENDIX III. BALTH. VAN DER POL 235 


smoothed one, this time for any a. The latter property of a poten- 
tial function is well known in potential theory; it is here derived 
quite naturally in a general theory of smoothing. In the special 
case of one dimension, the function is g(x) = x, and obviously here 
the smoothed function equals the unsmoothed one. 


9. In the three-dimensional case, we may also (as in one dimen- 
sion) investigate the possibilities of “unsmoothing’”’ when the 
smoothed function g* (x, y, z) is given. Writing (20) as 


(20a) f* (pi, bo, Pa) = 2 (hi Po Ps) * HP be, Ps) 
we at once obtain the operational solution 
(21) F(b b2» P3) = Q (pi, Po» bs) f* (Pi Po Pa) 


and again we have to study the “translation” of (21) into the 
original variables (x,y,z). To this end we write 


aV = a(pi + Hi +p 


Then we can define the rational numbers 8, through the power 
series of the inverse operator 2-1(2t) 

exp (— 4H) £ f l T Sa 
99) 8 = — expt — — (expt—1)} = ¥ Se 
in analogy with (10), where a similar power series determined the 
Bernoullian numbers. The first few #,’s are 


fo = 1 i fs = — 39 
fı = 5 ba = — 350 


Like in the case of the Bernoullian numbers B,, also for the £,’s 
similar recurrence relations exist. Unfortunately it remains an 
open question whether a generalization of the Von Staudt-Clausen 
theorem exists for the numbers #,. 


APPENDIX IV 


The Finite-Difference Analogy of the 
Periodic Wave Equation and the Potential Equation 


By BALTH. VAN DER POL 


1. Itis well known, that two axially symmetrical solutions of the 
two-dimensional time periodic wave equation (k real) 


(1) a + os + ku = 0 
are 

(2a) u = m(x, y) = AHP (kp), 
(2b) u = m(x, y) = BHP (Rp), 


or linear combinations of u, and u, such as 
u(x, y) = 4C {HP (kp) + HP (ke) = CJolkp) 


Here p = (x? + y2)? and HM, H®) are the Hankel functions, J, is 
the zeroth order Bessel function, and A, B, and C are constants. 
Similarly two axially symmetrical solutions of the corresponding 
two-dimensional potential equations 


ou gu 
(3) aa a 
Ox oy 
are 
(4a) u = m(x, y) =A 
(4b) u = u(x, y) = B log p 


or linear combinations of u, and wp. 
It is the purpose of the present section of this appendix to 
consider solutions of the corresponding partial fintte difference 
[237] 


238 PROBABILITY IN PHYSICAL SCIENCES 


equations (with constant coefficients) 


(5) (47 + An + h*) u(m, n) = 0 
and 
(6) (42, + A2)u(m, n) = 0 


where m and n are positive or negative integers, including zero. 
The symmetrical difference operator 4, is defined by 


A fln) = Fin + 3) — f(n — 9) 
and hence 
Ant = f(m + 1) — 2f(m) + f(m — 1) 
Thus (5) and (6) can also be written as follows: 
(5a) u(m+1,n) + u(m—1,n) + u(m,n + 1) 
+ u(m,n — 1) + (k? — 4) u(m,n) = 0 
and 
(6a) u(m+1,”) + u(m—1,n) + u(m,n + 1) 
+ u(m,n — 1) — 4u(m, n) = 0 
Also the inhomogeneous cases, where (5) and (6) have the right 


member different from zero, will have our attention. Especially 
the “discrete potential equation” 


ð 9 fl, m=n=0 
(7) (Ann + An) lm, m) = H otherwise 


will be considered in detail (including a complete numerical dis- 
cussion of its solution) together with a simple concrete physical 
application of (7). 
2. Let us first consider 
l, m=n=0 
2 2 2 = 
(8) or es ae | 0, otherwise 


It is not difficult to verify that a solution of it is given by 


APPENDIX IV. BALTH. VAN DER POL 239 


(9) u = m(m, n) = (22) i dd 
s (2%) [" exp [7(nd + my) ][2 cos ġ + 2 cos y + k?— 4]-1dy 


In fact we note that the operator A? operating on exp [i (md+ny) | 
yields 

{exp (26) — 2 + exp (2¢)} exp [7(md + ny)] 
and hence the operator A? + A? + k? gives 


{exp (i$) — 2 + exp (— i$) + exp (iy) — 2 + exp (— ty) + A} 
exp [i (m + ny)] 
= {2 cos ġ + 2 cos y + k? — 4} exp [t (md + ny)] 


The extra factor which thus arises, equals the denominator in the 
integrand of (9). Thus we have 


(Am + An + R?) u (m, n; k) 


Qn 2r 
=a) f dd - (2m) dy exp [t(md + ny)] = {0 maa 
Hence the function u(m, n) as given by (9) actually satisfies the 
partial difference equation (8). This solution is fully symmetrical 
with respect to n and m. Also it does not change if we replace n 
by —n and (or) m by —m, because the denominator in (9) is even 
in both m and n. Hence we may write (9) in the somewhat more 
general form 


(10)  m(m, n; k) = (2a) [dg 
. (27)-4 i dy exp [i(+ md + ny)][2 cos d + 2 cos y + k?— 4] 


where the signs + mean that any one, independent of the other, 
may be chosen at will + or —. 

If k? > 8 the denominator is positive, so that in this case (10) 
may further be transformed as follows: 


240 PROBABILITY IN PHYSICAL SCIENCES 


u (m, n; k) = (2x)! f deb (2x) |Ý" dy exp [i (+ mp + my)] 
- [* exp [— a(cos $ + cos p + $4? — 2)] Fda 
= } "da exp[—a(}k?—2)] (2x) |" exp [+ imp—a.cos $]dg 
(2a) f?" exp [+ inp — a cos y] dy 
= §(— 1)" [exp [— a($k? — 2) JI mi (0t) Ini (x) da 


Whereas in (11), m and n still occur symmetrically, we can, at the 
expense of losing this symmetry, and for the purpose of obtaining 
numerical results, perform one integration in (10), which is then 
left in the form of a single elliptical integral. 

With the aid of the latter integral, it can be shown that (10) 
has logarithmic singularities at the five branch points k = 0, + 2, 
+ 24/2. We will however not pursue (10) any further, but will 
now turn our attention to the “‘discrete potential equation,” which 
in many respects shows a simpler behavior. 


3. The partial inhomogeneous finite difference equation for the 


° ° a ° ° 

° b e d ° 

o = o C o = © 

| | 

e ° ° o ° 
Figure 1. 


“discrete potentials”? in the integer points (m, n) next to be con- 
sidered is 
oe A raume AE 

ý a 0, nall other points (m, n) 


APPENDIX IV. BALTH. VAN DER POL 241 


According to the definition of A?, + A? this equation requires (see 
Fig. 1) that, if the values of u,n „ are plotted in a square lattice, 
the value at any point e, except at the origin, should be the arith- 
metical mean of those at the surrounding points a, b, c, and d. 

For instance the numbers on a monthly calendar, such as 


5 12 19 26 

6 13 20 27 

7 14 21 28 
l 8 15 22 29 
2 9 16 23 30 
3 10 17 24 31 
4 11 18 25 


satisfy this requirement. 

In order to obtain a solution of (7) it is not possible to simply 
consider equation (10) and to take k = 0, because the resulting 
expression would be divergent. The artifice to be used is to take 
instead of u (m, n, k) of (10) the function u,(m, n) defined by 

u,(m, n) = lim {u,(m, n; k) — u (0, 0, k)} 


k—0 


We thus obtain the function 
27 
(12)  m(m, n) = (2x) |" dg 


> (27)? e dy[1— exp (i (+ mẹ + ny))] [4 — 2 cos $ — 2 cos yj} 


which is convergent and which we shall further denote simply by 
u(m,n). Itis also easily proved, along the lines followed before, 
that (12) indeed satisfies (7). 

Sobolev (see Section 8) has shown that if a solution of (7) 
with the right-hand member everywhere zero, grows more slowly 
than (m? + n?)? at infinity, then this solution must be a constant. 
The “discrete” potential (12) which has the property u,(0, 0) = 0, 
behaves at infinity as log (m? + n?)* and in fact is the only one 
which does so. Hence our solution u,(m, n) is the one which cor- 
responds to the “smooth” two-dimensional potential log (x?+-y?)8. 


242 PROBABILITY IN PHYSICAL SCIENCES 


It is this function ua(m, n), as given by (12) which we shall now 
study in some detail. 


3a. First we remark that we may replace the numerator in the 
integrand of (12) by 1 — cos (mọ + ny) because the remaining 
odd part — ż sin (+ md + ny) cancels on integration. 

With the latter replacement (12) can be written as 


u(m,n) = (27x) K dd 


(12) N 

(2x) | dy 4 sin? [4ng + ny)] [sin?($4) + sin?(hy)]7 
OT as 
m u(m, n) = (27) fÙ de (22) f? dy $sin? (md + ny) 


- [sin? ġ + sin? wy] 


It is of interest to compare (13) with what in a similar way, is 
obtained as a solution of the corresponding one-dimensional case: 


27 i 
u(n) = (2) f FE + e(+ ing)] [1 — cos p] dg 
(13a) i 
= (22) [™ sin? ng (sin? $)7 dg 
The last integral is known to be equal to |n|, which indeed is the 
analogous solution of 


l, n=0 
0, for all other n 


n 


A? u(n) = | 
Returning to (13) we see at once that 
(14) u(0, 0) = 0 


and also, the value of u(1, 0) can easily be obtained. We have 
from (13): 


u(1, 0) = (22) ff dg > (2a) [f7 dy $sin? ¢ [sin? $ + sin? y] 


This equals the same expression with ¢ replaced by y. Taking the 


APPENDIX IV. BALTH. VAN DER POL 243 


sum we obtain at once 
2u(1,0) = 4 
and hence 
u(1,0) = 4 
Remembering that, due to symmetry, 
u(1,0) = u(—1, 0) = u(0, 1) = u(0, — 1) 
(16) could also have been directly obtained from the difference 


equation (7), because this states that 


u(1, 0) + u(—1, 0) + u(0, 1) + u(0, —1) — 4u(0, 0) = 1 


or 
4u(1,0) = 1 

or 
u(1, 0) =} 

3b. Returning to (12), if we make the substitutions 

exp (td) =x 
exp (ty) = y 

it takes the form of the following double contour integral 

— (9774)\-1 -1 dx- ;\—1 —1 
u(m,n) = (271) $42 dx + (2ni) Pad dy 


(gery) 4 ee) og) 
= (2ni)-! Paa dæ + (2ni)! Pa dy 
» (1 — 2*™y*") [doy — y(e® + 1) — aly? + 1))7 


3c. A much more manageable form for (12) is obtained if we 
change to the diagonal coordinates 


L= m+n, v=m— n 
Herewith (12) becomes 
ulm, n) = (2x) dd 


(2a) [Ý dp{1 — exp [i(+ wkd + y) + 4 — v))]} 
- [4 — 2 cos ġ — 2cosy]} 


244 PROBABILITY IN PHYSICAL SCIENCES 


If we further write the denominator as 


4 — soos EY. cos? — Y 


> 2 


we obtain 


af ath- pe 
lı — coe t UR ak 4 
9 2 


($ +y)/2=« ($—y)/2=6 
u(m,n) obtains the form 
(15) u(m, n) = (27) f” da 
. (2a) fY dp 4 [1 — exp (t(+ ua + vB))|[1 — cos æ cos p]-1 
or 
27 
(16) u(m, n) = (22) |" da 
. (27) 1 M dB 4 [1 — cos (+ ua + up)] [1 — cos «cos p| 
Incidentally we note that if herein we substitute again 
exp (ta) = x, exp (#8) = y 
it takes the form 
u(m, n) = (200i) i dx 
+ (2mi) $ WE — sty] [dey — (2 + 1) +1) 


The latter expression agrees with Sobolev’s result (as reported 
in Math. Rev. 14, 987 (1953)), but only if we there change m and n 
into u and v. We now introduce a little artifice which seems essen- 


y|=1 


APPENDIX IV. BALTH. VAN DER POL 245 


tial. We note that the integrand of (15) allows us to replace 
(2x)! |9 da by n foda because in the range 0 < « < a already 
the terms cos « and exp (+ tua) obtain all possible relative values. 
Hence we may write (15) as 


(17) u(m,n) = x fe da 


. (2a) {" dB 4 [1 — exp (i(+ wa + vB))][1 — cos a cos f]-} 


We can now affect the integration with respect to p with the help 
of a contour integral by writing 


exp (if) = $ 
Substitution yields 
— nl |" da » (Omi) -1 

u(m,n) =x K da + (271) Pas ds 

- ALL — exp (+ ina) s+] [1 — Joos als + s7)]~ 
== a} i l/cos æ da + (21)71 $a ds 

-dfexp (iva) s*” — 1] [s? — 2s/cos « + 1]7 
=a |" 1/cosada: (2i) Pa ds- 1 [exp (+ iua)s*” — 1] 


. [s — (1 — sin «)/cos a]~4[s — (1 + sin «)/cos æ)]7! 


(18) 


In the reduced integration range of «, sin « is positive, so that the 
pole at s = s, = (1 — sin «)/cos « lies inside the unit circle |s| = 1, 
whereas the pole s = s = (1 + sin «)/cos « lies outside this circle. 
Further, in order not to be obliged to consider also a pole, originat- 
ing from s™” we are free to replace s+” by sl. We are therefore 
left with the only pole at s = s,. Taking its residue we can reduce 
(18) to 


u(m, n) = a= |" Hexp (+ iua)[(1 — sin «)/cos «]”! — 1} 
(19) -[(1 — sin «)/cos « — (1 + sin «)/cos «]—1 (cos æ)! da 
=a) I }(sin «)—!da{1 — exp (+ tua)[(1 — sin «)/cos a]”!} 
Finally the substitution 
exp (i) = (s + 1)/(s — t) 


246 PROBABILITY IN PHYSICAL SCIENCES 


gives 
da = — 2ds/(s?+ 1), sina = 2s/(s? + 1), cos «= (s? — 1)/(s2 + 1) 


so that, when « runs from 0 to x, s runs from 0 to oo. 
Thus (19) becomes 


(20) u(m, n) 

= (4x) |? stds{1 — [(s + i)/(s + 1)]=* [ls +) /(s + 4)" 
which is our final form of u(m, n), the convergence of which is 
easily established because at s = 0 and s = oo the form inside the 


bracket vanishes and there are no singularities on the positive real 
axis. 


4. The expression (20) enables us at once to obtain in closed form 
the values of u(m, n) on the diagonals. There we may take u = 0, 
so that |x| = 2|n|. Thus (20) becomes on the diagonals 


(21) u(n, n) = (4x) N sids{1 "e 1)]211} 
Substituting (s — 1)/(s + 1) = ż, we obtain 


(22 y= (en) [aly ae) 
i ae la) eee Jol — e 


which, with the further substitution 
t = exp (— 3a) 
yields for the value of u(n, n), on the diagonals 
u(n, n) = (2x) | exp (fx) (exp x — 1)4[1 — exp (— |n|x)] de 
= (2a) | {1 — exp [— (|n| — 4)x]} (exp x — 1) dx 
— (27) |? [1 — exp (— 42)] (exp x — 1) dx 


APPENDIX IV. BALTH. VAN DER POL 247 


where 


d d 
y(x) = z, 8 II @) = 7,08 P(x + 1) 


Therefore (23) gives the values of u(n, n) on the diagonals expli- 
citly in terms of the y-function. 
Moreover, if in (22) we write 


1] — fain 


ge ee 


we also obtain for u(n, n) on the diagonals 


oA =—(1 1 l l l ) 
(24) AER re oe e tmi 


We specially note from (24) the values 


1 
u(1, 1) => 
uA 
(2, 2) 4 1 
u ee = — — 
3 n 
23 1 
u(3, 3) =— — 
15 x 
176 1 
u(4, 4) =— — 
105 x 
1689 1 
oe ear 


It is of interest here to compare these values with the ones just 
obtained, 1.e., 


whence we see that both rational and transcendental numbers 
present themselves. 
Equation (23) further enables us to easily obtain the asymp- 


248 PROBABILITY IN PHYSICAL SCIENCES 


totic behavior of u(n, n) for large |x|, for it is known that 
w(x) x log x — 1/(2x), (x—> o) 
Because, further, 
p(— 3) = mp 2 log (2) 
(23) becomes for |n|—> œ 
u(n, n) ~ (2x) flog (In| — 3) — (In| — 1) — y — 2 log 2} 
or 
u(n, n) ~œ (27)~ log |n] 


which therefore shows its logarithmic behavior (at infinity), at 
least on the diagonals, as alluded to in Section 3. 

In order to find the value of u(n, n) on the diagonals, we took 
u = 0, v= 2n. We might also have taken u = 2n, v= 0. If 
we substitute the latter values, (16) can be written as follows 


u(n, n) = (2m) o da + (27x)-1 N dp 

(25) . 4 (1 — cos 2na)/(1 — cos « cos f) 
S F da (1—cos 2na)/sin a= (2a)—4 ff (sin? najsin a) da 

It is of interest to compare expression (25) for the two-dimensional 

diagonal values with (13a) pertaining to the one-dimensional 


problem, which latter has the same form as (25), except that in the 
denominator, sin « is replaced by sin? «. 


5. Summarizing we have, so far, obtained the following simple 
numerical results: 


u(0, 0) = 0 

(26) u(1, 0) = 4 
umn = (ee 
x 3 5 7 in| + 1 


Other numerical values of u(m, n) might be obtained by evaluating 
the relatively simple integral (20). But there exists a more 


APPENDIX IV. BALTH. VAN DER POL 249 


straightforward procedure, which is based on the preceding ex- 
plicitly known values together with our basic difference equation 
(7). In Fig. 2 we have plotted the values as given by (26), whereas 


23 — 23 
a @ d c d | ca 
| 4 4 | 
e T i | a, a e 
d a S i a L d d 
c o AEON EE manne | er c 
ry 
d se EEEE, PERS d 
| roi of 
e 4 a b a 4 e 
37 IT 
23 | | 23 
Ek e d c d e E7 
Figure 2. 
the still unknown values are marked as a, b, c, d,.... Due to the 


symmetry of our problem we are sure that, e.g., the value a occurs 
eight times, as marked in the figure; b and c occur four times, d 
and e occur evght times, etc. If we now consider a diamond with 
l/x at its center, as marked by dotted lines in the figure, the differ- 
ence equation (7) tells us that the value at the center should be the 
average of those at its four corners. Hence 


->H 1 -) 
a 4° "as 
Or 


a = 2/n — } 


250 PROBABILITY IN PHYSICAL SCIENCES 


Similarly considering next a diamond with its center at + we should 
have 

1 /l l 1 

getetstI=5 
and hence 

b = l — 2/n 

Knowing a and b we next consider a diamond with its center at a. 
This enables us to write 


l /1 4 
s(-+ ++) =a 
IT 


4\zx 3 
or 
a. tedd 7 2 l 
4\x 37 It oe 4. 
or 
23 1 
q = 
3°27 
Similarly we find 
2 1 
6 = — a 
3n 4 
17 12 
— es 
4 IU 


and so on. 

Hence, given the values of (26), the determination of the val- 
ues of u(m, n) at all other lattice points is a matter of elementary 
arithmetic. The numerical values thus found, are given in Fig. 3. 


6. Next we wish to inquire what, in the case of “discrete” 
potentials, becomes of the property of “smooth” potentials, that, 
in a region containing no singularities, the average value of the 
potential over the circumference of any circle equals the potential 
at its center. In order to find two analogies for “‘discrete’’ po- 
tentials we can proceed as follows. We mark on a square lattice 


APPENDIX IV. BALTH. VAN DER POL 251 


the weights of the values to be added, in order to obtain zero, e.g. 


23 akal alle oo Mele a 2920 eal, eee 28 
157 37 4 37 4 n ” 37 4 157 

| | | | | | | 
ge A eee es ee ee, Be 
37 4 37 M 4 1 wy 4 37 37 4 

| | | | | | | 
OO eee A es, ee, es, al ee a 
37 r 4 m 4 n 74 37 

| | | | | | | 
17 12 2 1 2 7_12 
Cp ge Ge Oa i 

| | | | | | | 
35922). 1 #1. 1 21 __ 23.5 
37 wr 4 A 4 T wl 37 

| | | | | | | 
2 21 2 2 
2. eS a a 

| | | | | | | 
23 as 24l ini 23.2 = 1712 ee 23.2 — 2 L a 23 
157 37 4 37 47 37 37 4 157 

Figure 3. 


If we take the last point two steps below the point marked — 4 and 
superimpose the weights 


252 PROBABILITY IN PHYSICAL SCIENCES 


—I] , 

— 1 4 — I] 
—] 

we obtain 

l 

l —4 1 
1—] 

—] 4 —] 
— 1 


so that the weight at the central point becomes zero. Proceeding 
further in this way we can obtain the following two theorems per- 
taining to “‘discrete” potentials u (m, n) in all those regions where 
they satisfy 

(An + An u(m, n) = 0 


THEOREM 1. The average value of any discrete potential over 
any diamond shaped boundary equals the average value over its 
diagonals. 

By this it is to be understood that the corners of the diamond 
are to be considered as belonging to the boundary only, but these 
corners have to be counted as half, while the center, which lies on 
both diagonals, 1s to be counted twice. 


THEOREM 2. The average value over the circumference of any 
square equals the average value over tts diagonals. 

Here the corners are understood to belong to both sides and 
to the diagonal. If the inside of the square contains an even num- 
ber of unit steps the diagonals intersect in the center at a point 
belonging to the system, and this point has then to be counted 
twice as belonging to both diagonals. 


7. In this paragraph we treat a simple physical application of the 
analytical results obtained above. 

Consider a square-meshed wire gauze (see Fig. 4) (like chicken 
gauze but with square meshes instead of hexagonal ones) and let 
the electrical resistance of each side of a square be R. It is sup- 


APPENDIX IV. BALTH. VAN DER POL 253 


posed that the gauze extends to infinity. Let, as indicated in the 
figure, an electrical current J, enter the gauze at 0 and leave it at 
infinity. The latter condition can be visualized by imagining a 


m 
E 
E 
r 
= 
= 
E 
E 
E 
m 
E 
A 
‘= 
as 
En 
uy 
a 


BR. OO NJAS TINTI 
ENR GEER URE RER 
BE RRRRRLCORRRR: EEE 


AAL A 


Figure 4. 


large infinitely conducting ring to be soldered on the gauze. (The 
radius of this ring is supposed to tend towards infinity.) Further- 
more, let the resulting potential at the point (m, n) be V, (m, n). 
Then, according to Ohm’s law, the current flowing from the point 
(m,n) to the point (m + 1, n) will be 


1/R{V (m + 1, n) — V, (m, n)} 
The sum of all four currents leaving the point (m, n) must be zero, 
in symbols 
1/R{V,(m + 1, n) — Vy(m, n) + Vi(m — 1, n) — V (m, n) 
+ Vi(m, m + 1) — Vilim, n) + Vim, n — 1) — Vi(m, n)} = 0 
This is true for all points (m, n) except for the point (0, 0) where 


this sum is equal to the current J, entering it at that point. Hence 
V,(m,n) should satisfy the difference equation 


RJ» m=n=0 
2 2 = o 
(27) (Am + An) Vilm, n) = 0, everywhere else 


254 PROBABILITY IN PHYSICAL SCIENCES 


Since our problem is linear, it allows superposition. We therefore 
may superpose upon the given situation a similar one, (Fig. 5) 


CRO 
AITITA N 

AHH +t 

ERTEN” IN 


Figure 5 


where now a current Jo leaves at an arbitrary point (m,n) and 
enters at infinity. The currents at infinity then cancel, and we 


Figure 6. 


are therefore left with the case where the current J, enters the 
gauze at the point (0, 0) and leaves it at the point (m, mn) (see 


APPENDIX IV. BALTH. VAN DER POL 255 


Fig. 6), while at the same time we can forget about the large ring 
at infinity. The potential in this second case therefore satisfies 
the equation 


m = My, 
(A2, + 42) V, (m, n) = | SRo nan 
0, otherwise 


Or 


—RJ,, m=n=0 
2 2 = = we 
(28) (Am + An) Valm — m, n — m) 0, otherwise 


Subtraction of (28) from (27) gives 


2RJọo m=n=0 


(29) (AB +42) {V(m, n) Vamm, n—m,)}— [Pr oae 


Now V,(m,n) — V,(m — m, n — n) is the potential difference 
between the two points (m,n) and (m — m, n —n,). Let this 
potential difference be called V, (m,n). Then (29) states 


2R]o>, m=n=O0 


2 2 — 
(Am + An) Valm, n) = 0, otherwise 


But again according to Ohm’s law this potential difference 
Via(m, n) equals Jo times the effective resistance Rem, ny, (m—my, n—n,)} 
which one would measure experimentally between the points 
(m,n) and (m — m,,n — m). Hence the effective resistance is 
governed by the equation 
2R, m = m,n =n 
2 2 — 1? 1 
(A7, + A) Leas n), (m—m,,n—n,)} | 0, otherwise 

or 


2,NM=M,n=N 
ə ə __ ’ 1? 1 
(30) (Am + An) Riim, n, n-m, nn} = 0, otherwise 


Equation (30) is almost exactly our fundamental partial difference 
equation (7) for “discrete potentials” only, on the right-hand side 
we have 2 instead of 1. Hence the effective relative resistance as 
one would measure between the points (m, n) and (m — m,n — n) 
equals 2u(m,,7,). Taking m = n = 0 we obtain for the effective 


256 PROBABILITY IN PHYSICAL SCIENCES 


relative resistance between the points (m, n) and (0, 0) the value 
2u(m;, m). Hence if the actual resistance R of each side is one 
Ohm, we would measure between points (0, 0) and (0, 1) an effective 
resistance 2u(0, 1) = 4. Of course the same effective resistance 
would be obtained between any two neighboring points, i.e., 
between any point and the next one to the left, the right, above, or 
below it. Similarly, between the points (0, 0) and the first diagonal 
point (1,1) the effective resistance is 2-1/7 = 2/n. Again the 
effective resistance between two points separated by a “Knights 
move” (i.e., between (m,n) and (m + 2,n + 1) is 4/m — 4. 

Assuming the actual resistance of each side to be one Ohm, 
the effective resistances, as they would be measured between the 
point (0,0) and any other point (m,n) are therefore equal to 
twice the values as those given in Fig. 3. 

Finally according to (24) we have on a diagonal 


2u(1, 1) = 0.635... 
2u(2,2) = 0.850... 
2u(3, 3) = 0.976... 
2u(4,4) = 1.065... 


Hence in the gauze we have to go three or four steps away along a 
diagonal in order to find an effective resistance about equal to the 
actual resistance of each branch. 


8. In this paragraph we give a few references to the literature 
without aiming at completeness. In the first place we refer to 
R. Courant, K. O. Friedrichs, and H. Lewy, Die partiellen Differen- 
zengleichungen der mathematischen Physik, Math. Ann., 100, 
32-74 (1928). Integrals closely related to (9) were encountered 
by H. Kramers, in his researches of ferromagnetism, see Proc. 
Acad. Amst., 37, 378 (1934). 
Solution of (42 + A?) u(m,n) of the form: 


u(m, n) =| [(s + 2)/(s + 23) ]™+"[(s + 74)/(s + 27)]™-" p(s) ds 


were obtained by the present author in J. Inst. Electr. Eng. (Lon- 
don), 81, 381, (1937). Integrals of the form (12), but extended to 


APPENDIX IV. BALTH. VAN DER POL 257 


three dimensions, also in connection with magnetic anisotropy, 
were considered by W. F. van Peype, Physica, 5, 465, (1938); See 
also G. N. Watson, Quart. J. Math., 10, 266, (1939). Further, an 
extensive treatment of the ‘discrete’? wave problem, the “‘dis- 
crete” potential problem, both in two dimensions can be found in 
the book by Balth. van der Pol and H. Bremmer, Operational 
Calculus Based on the Two-Sided Laplace Integral, Cambridge 
Univ. Press, London, 1950. In the present paper, extensive use 
was made of results there obtained. Also reference would be 
made to A. Stohr, Math. Nachr., 3, 295, (1950), where the ‘‘dis- 
crete’’ two-dimensional wave equation is studied and also to the 
same journal 3, 330, (1950), where the two-dimensional “discrete” 
potential equation is considered. (see Math. Rev. 12, 711, (1951).) 
Finally we refer to S. L. Sobolev, Doklady Akad. Nauk SSSR, 
(5) 87, 179, 341 (1952), a review of which appeared in Math. Rev., 
14, 987, (1953). It is to the latter review that we referred to in 
our text. The original (in Russian) was not available to us. 


Notes and Bibliography 


Chapter I 


§ 1. The idea that probability theory can be formalized on the basis of 
measure theory occurs first in the classical paper of E. Borel “Sur les pro- 
babilités dénombrables et leurs applications aritmétiques,” Rend. Circ. 
Mat. Palermo 47 (1909), pp. 247-271. It is Borel who discovered “‘strong 
laws” of large numbers (see §5). The first axiomatization of a significant 
portion of probability theory was given by H. Steinhaus in his pioneering 
paper ‘‘Les probabilités dénombrables et leur rapport à la théorie de la 
mesure,” Fund. Math. 4 (1922), pp. 286-310. The most complete axiomati- 
zation is due to Kolmogoroff and can be found in his Ergebnisse tract 
Grundbegriffe der Wahrschetnlichkettsvechnung (1933). It is not widely 
known that axiomatization of probability theory was one of the problems 
on the famous lst of Hilbert. 


§ 2. The derivation of this section is due to Maxwell but is often attrib- 
uted to Borel. A proof of the “weak law of large numbers” is obtained by 
showing that 


l 1 N 2 
ee — E Ya,e(Vse) — Prob {a < Vis < P} | do 
Sgn (2) I sccm k a 


approaches 0 as N->oo (remember that R depends on N!). 


§ 3. The presentation of this section follows our papers “‘On the aver- 
age number of real roots of a random algebraic equation” which have ap- 
peared in the Bull. Amer. Math. Soc. 49 (1943), pp. 314-320 (and p. 938) 
and Proc. London Math. Soc. 50 (1948), pp. 390-408. The paper by Erdés 
and Offord appeared in the Proc. London Math. Soc. 6 (1956), pp. 139-160. 
The subject of zeros of “random functions” is of considerable interest and 
importance especially in the theory of random noise and related phenomena. 
The most important contributions are due to S. O. Rice “‘Mathematical 
theory of random noise,” Bell System Tech. J. 2B (1944), pp. 282-332 and 
25 (1945), pp. 46-156. 


§ 4. The elegant formula (1.4.13) was discovered by A. Rényi “On the 
density of certain sequences of integers,” Acad. Serbe. Bull. Acad. Sci. Mat. 
Nat. 8 (1955), pp. 157-162. The proof reproduced here follows closely our 
note “A remark on the preceding paper by A. Rényi,” tbid., pp. 163-165. 
If we denote by M{f(n)} the limit (if it exists) 


[259] 


260 PROBABILITY IN PHYSICAL SCIENCES 


lim 
N—->oco 4 


then the crux of our proof is the equation 


ag £ By(”) r r 
mle k=1 = mi Il oo) = [H a {tee 
k=1 k=1 
This follows by an elementary calculation from (1.4.6) and is simply a 
repetition of the familiar argument used to prove that the expectation of a 
product of independent random variables is the product of expectations. 


§ 6. See G. Szekers and P. Turán, “Über ein Extremalproblem in der 
Determinantentheorie’ (Hungarian with a German summary), Mat. ter- 
mészett Evtes. 56 (1937), pp. 796-804. 


Chapter II 


§ 1. The random walk problem of this section for the special case 
plx) = (27) (0 Sa < 2x) has received considerable attention in the 
literature. It is particularly simple because the successive displacements 
are, in this case, independent. For treatments of this case see 5. Chandra- 
sekhar ‘‘Stochastic problems in Physics and Astronomy,” Rev. Modern 
Phys. 15 (1943), pp. 1-89 (reprinted in Selected Papers on Noise and Stoch- 
astic Processes, Dover, New York) and S. O. Rice “Distribution of a sum of 
n sine waves,” Bell. Tel. System Tech. Publ. Monograph 2365. The theorem 
of Paul Lévy (called continuity theorem for Fourier-Stieltjes transforms) 
can be found in many standard texts. See, e.g., M. Loéve, Probability 
Theory, Van Nostrand, New York, 1955. The theorem on moments was 
first proved by Tchebycheff. It was considerably extended by Markoff (see, 
e.g., his Wahrscheinlichkettsrvechnung, Leipzig, 1912, which, in my opinion, is 
still one of the best and most imaginative books in probability). Condition 
(11.1.5) was found by Carleman. 


§ 2. As far as I know the use of perturbation theory in proving a limit 
theorem is new. A rigorous justification of perturbation calculations is not 
always easy, but for our purposes the treatment of Rellich, ““St6rungstheorie 
der Spektralzerlegung I and II,” Math. Ann. 113 (1937), pp. 600-619 and 
pp. 677—685 is quite sufficient (the same applies to §3). 


§ 3. Although the model of a polymer chain considered in this section 
is attributed to Eyring the corresponding random walk problem was already 
considered by Smoluchowski in one of his earliest papers on Brownian 
motion. The calculation of this section is essentially due to Smoluchowski. 


NOTES AND BIBLIOGRAPHY 261 


§ 4. The first proof of the main result of this section was given by 
Moran ‘‘The statistical distribution of the length of a rubber molecule,” 
Proc. Cambridge Phil. Soc. 44 (1948), pp. 342-344, who used a general 
theorem of S. Bernstein. Our proof (based on perturbation theory) is much 
more analytic but also more direct. 


§ 5. The theorems of this section are due to W. Feller ‘Fluctuation 
theory of recurrent events,” Trans. Amer. Math. Soc. 67 (1949), pp. 98-119, 
but we follow the presentation of D. A. Darling and M. Kac “On occupation 
times for Markov processes,” Trans. Amer. Math. Soc. 84 (1957), pp. 
444-458. 


§ 6. The mathematical form of the uncertainty principle is due to 
H. Weyl. 


§ 7, 8. The presentation of these sections follows our paper ‘“‘Toeplitz 
matrices, translation kernels and a related theorem in probability theory,” 
Duke Math. J. 21 (1954), pp. 501-509. Szegö’s result was published in “‘On 
certain hermitian forms associated with the Fourier series of a positive 
function” Festskrift Marcel Riesz, Lund, 1952, pp. 228-238. The problem 
of determining the limit (II.7.6) originated with Onsager in connection with 
his work on the two-dimensional Ising model. Spitzer’s work is contained 
mainly in his paper “A combinatorial lemma and its application to prob- 
ability theory” Trans. Amer. Math. Soc. 82 (1956), pp. 323-339. The idea 
of using purely combinatorial methods in treating certain probabilistic 
problems originated with E. S. Andersen, whose work (cited in Spitzer’s 
paper above) strongly influenced Spitzer. 


§ 9. The theorem of the section was first proved by P. Erdés and M. 
Kac. For a recent survey of the whole field of applications of probability 
methods to number theory see the expository article by Kubilyus in Uspehi 
Matem. Nauk. 11 (1956), pp. 31-66 (in Russian). 


Chapter III 


§ 1, 2. Boltzmann summarized most (but not all) of his work in a two 
volume treatise Vorlesungen über Gastheone. This is one of the greatest 
books in the history of exact sciences and the reader is strongly advised to 
consult it. It is tough going but the rewards are great. 


§ 3. For references to the polemic between Boltzmann and Zermelo see 
the classic article of P. and T. Ehrenfest in the Enc. Math. Wiss. 1911. 
This celebrated exposition contains the most penetrating analysis of con- 
ceptual foundations of statistical mechanics and it must be read by anyone 
who contemplates a serious study of the subject. 


262 PROBABILITY IN PHYSICAL SCIENCES 


§ 4, 5. The proof of Poincaré’s theorem as well as the calculation of 
the mean recurrence time follows our note ‘“‘On the notion of recurrence in 
discrete stochastic processes,” Bull. Amer. Math. Soc. 53 (1947), pp. 
1002-1010. The first proof of (11.5.1) was given by G. D. Birkhoff, “Proof of 
a recurrence theorem for strongly transitive systems,” Proc. Nat. Acad. Sci. 
U.S.A. 17 (1931), pp. 650-655. This note was so overshadowed by Birk- 
hoff’s subsequent proof of the ergodic theorem that it escaped notice. 
Smoluchowski’s work on mean recurrence times is summarized in his beauti- 
ful lectures ‘‘Drei Vorträge über Diffusion, Brownsche Molekularbewegung 
und Koagulation von Kolloidteilchen,’’ Phys. Z. 17 (1916), pp. 557-571 
and 587-599. 


§ 6. Kolmogoroff’s theorem is proved in his book cited above. 


§ 7. For the original description of the model see P. and T. Ehrenfest, 
“Uber zwei bekannte Einwände gegen das Boltzmannsche H-Theorem,”’ 
Phys. Z. 8 (1907), pp. 311-314. 

The graph is taken from the second volume of Cl. Schaeffer’s textbook 
Einführung in die Theoretische Physik. 


§ 8. This section follows closely mimeographed notes of Higgins 
Lectures given by G. E. Uhlenbeck at Princeton University in the fall of 
1954. 


§ 9. The reader may have noticed that we do not discuss in any detail 
the ergodic theorem. The role of this theorem in statistical mechanics has 
been greatly exaggerated and we have tried to play it down somewhat. 

For an excellent account of ergodic theory see P. R. Halmos, Lectures 
on Ergodic Theory published in 1956 by the Mathematical Society of Japan. 
The summary of Gibbs’s ideas follows closely Uhlenbeck’s Higgins Lectures 
cited above. 


§ 10. The derivation of this section follows our paper “Random walk 
and the theory of Brownian motion,” Amer. Math. Monthly 54 (1947), 
pp. 369-391. 

It is curious to note that if one tries to use the method of this section to 
find first the right eigenvectors one runs into difficulties. 


§ 11. A. J. F. Siegert, “On the approach to statistical equilibrium,” 
Phys. Rev. 76 (1949), pp. 1708-1714. 

F. G. Hess, ‘‘Alternative solution to the Ehrenfest problem,” Amer. 
Math. Monthly 61 (1954), pp. 323-327. 


§ 12. Formulas for <n4(s)> and <n,4?2(s)> can be found, e.g., in M. C. 
Wang and G. E. Uhlenbeck ‘‘On the theory of Brownian motion II,” Rev. 
Modern Phys. 17 (1945), pp. 323-342. 


NOTES AND BIBLIOGRAPHY 263 


§ 13. See “Random walk and the theory of Brownian motion” cited 
above and for the discussion of entropy M. J. Klein, “Entropy and the 
Ehrenfest urn model,” Physica 22 (1956), pp. 569-575. 


§ 14. This section reproduces almost verbatim our note “Some re- 
marks on the use of probability in classical statistical mechanics” Acad. 
Roy. Belgique. Bull. Cl. Sci. 42 (1956), pp. 356-361. 


§ 16, 17, 18. These sections reproduce almost verbatim portions of 
our paper “‘Foundations of kinetic theory,’’ Proc. Third Berkeley Symp. on 
Math. Stat. and Prob., Vol. 3, pp. 171-197. 


§ 19. See R. Brout, “‘Statistical mechanics of irreversible processes, 
Part VII: Boltzmann equation,” Physica 22 (1956), pp. 509-524. Ina forth- 
coming review article written for the new Handbuch der Physith, H. Grad 
gives a detailed critique of the work of Brout and M. S. Green. 


§ 21. A summary of Smoluchowski’s work can be found in his “Drei 
Vortrage...’’ cited above (§5 of this chapter). 

For a more recent review see the paper of S. Chandrasekhar cited in §1 
of Chapter II. 


§ 22, 23. The material of these sections as well as some portions of 
subsequent sections date back to the Summer of 1946 when the author was 
John Simmon Guggenheim Memorial Fellow at the University of Michigan. 

The results were obtained in collaboration with Uhlenbeck who first 
drew author’s attention to this complex of problems. Although not previous 
ly published they formed the main part of an invited address delivered by 
the author at the University of Oregon in June 1952. 

Recently D.V. Lindley in the paper ‘‘The estimation of velocity distri- 
butions from counts,” Proc. Int. Congress, Amsterdam, 1954, pp. 427-444 
derived formula (III.23.2) using a different method and gave an interesting 
discussion of the work of Lord Rotschild on determination of the average 
speed of spermotozoa. Due to limitation of time the material of §§21-28 
was not presented at the Boulder Seminar. 


§ 24. Formulas (II1.24.4) and (III.24.5) were originally derived by 
Smoluchowski in a complicated way. A simple and elegant derivation was 
given by S. Chandrasekhar in his review article cited above. 


§ 26. For references to the work of R. Fiirth see the review article of 
S. Chandrasekhar. 


§ 27. For a statement and applications of the ‘‘inclusion-exclusion’”’ 
principle see, e.g., the excellent textbook of W. Feller, Introduction to 
Probability Theory and Its Applications published by Wiley and Sons. 


264 PROBABILITY IN PHYSICAL SCIENCES 


Chapter IV 


§ 1, 2. Wiener’s approach can be found in the chapter on random 
functions of the book ‘‘Fourier transforms in complex domain,” Ann. Math. 
Soc. Coll. Publ. written jointly with R. C. A. Paley. Doob’s approach can be 
found in his book Stochastic Processes published by Wiley and Sons. 

How much “fuss”? over measure theory is necessary for probability 
theory is a matter of taste. Personally, I prefer as little fuss as possible 
because I firmly believe that probability theory is more closely related to 
analysis, physics, and statistics than to measure theory as such. 


§ 3. See P. Lévy, “Sur les intégrales, dont les éléments sont des vari- 
ables aléatoires indépendantes,” Ann. Pisa 3 (1934), pp. 337-366. 


§ 4, 5, 6. These sections follow closely our paper “On some connec- 
tions between probability theory and differential and integral equations,”’ 
Proceedings of the Second Berkeley Symposium on Mathematical Statistics and 
Probability, pp. 189-215. See also D. Ray, “On spectra of second order 
differential operators,” Trans. Amer. Math. Soc. 77 (1954), pp. 299-321. 


§ 7. This application was first suggested by A. J. F. Siegert in an 
abstract “‘Brownian motion theory as a tool in statistical mechanics,” 
Phys. Rev. 86 (1952), p. 621. We follow a subsequent (and independent) 
derivation of Yaglom as reproduced in the excellent review article ‘‘Integra- 
tion in function spaces and its application to quantum physics” (in Russian) 
by Gelfand and Yaglom in Uspehi Mat. Nauk. 9 no 67 (1956), pp. 77-114. 


INDEX 


Avogadro number, 140, 161 


Bogoliubov, 131, 132, 188, 195 
Bohnenblust, 52 
Boltzmann, 59, 62, 72, 79, 80, 97, 
98, 131 
equation, 82, 110, 192 
property, 113, 116, 118, 123 
Borel, 259 
Born, 188 
Brout, 124, 130, 131, 188, 263 
Brownian motion, 132, 137, 141, 
143, 179 


Canonical ensemble, 104 

Capacitory potential, 181 

Central limit theorem, 56 

Chandrasekhar, 198, 260, 263 

Clausius, 59 

Coarse-grained density, 84 

Conditional probability, 70, 74, 75, 
77 

Curve, of H-theorem, 102 


Darling, 261 

Density, coarse-grained, 84 
Diffusion constant, 139, 161 
Distribution function, 26 
Doob, 70, 162, 163, 164 


Ehrenfest model, 80, 83, 86, 97, 

106, 159 
Markoffian character of, 95 

Ehrenfest, P., 72, 73, 77 

Ehrenfest, T., 72, 73, 77 

Einstein, 161 

Entropy, 61, 74, 97 

Eratosthenes, 13 

Erdos, 10, 261 

Ergodic theorem, 82, 83, 84, 262 

Excluded volume, 36 

Eyring, 36 


Feller, W., 186, 261, 263 

Feynman, 167, 221 
integrals, 168 

Fredholm determinant, 55 


Gibbs, 83, 84, 86, 97 
entropy, 85, 98 
Grad, 263 
Grand-canonical ensemble, 104 
Green, H. S., 188 
Green, M. S., 130, 263 
Grobe Dichte, 84 


H-curve, 79, 102 

H-theorem, 61, 72, 118, 123, 124 
Hadamard, 23 

Halmos, 262 

Heads or tails, 18 

Hess, 90 

Hunt, 47, 52 


Irregular point, 181 


Kac, 187, 261 

Karamata, 42 

Kirkwood, 188 

Klein, 97 

Knudsen gas, 159 

Kohn, 198 

Kolmogoroff, 70, 142, 163 


Laplace, 12 
Large numbers, strong law of, 22 
Large numbers, weak law of, 4 
Lévy, 27, 146, 164, 264 
Lindley, 263 
Liouville equation, 84, 104, 106, 
187, 188 

operator, 104 

theorem, 63, 80, 85, 184, 185 
Loschmidt, 61, 72, 74 
Luttinger, 198 


266 PROBABILITY IN PHYSICAL SCIENCES 


Markoff, 187 
chain, 73, 90 
character of Ehrenfest model, 95 
process, 87, 93, 150, 186 
property, 87, 145 
Master equation, 105, 106, 107, 109, 
110, 111, 131, 188 
Maxwell, 59 
gas, 111 
Mean recurrence time, 68, 80, 158 
Metric transitivity, 66-68, 72 
Microcanonical distribution, 185 
Microcanonical ensemble, 104 
Molecular chaos, 113 
Moran, 261 


Navier-Stokes equation, 194 
Numbers, square-free, 17 


Offord, 10 
Onsager, 261 


Partition function, 175 
Persistence time, 152 
Persistent process, 148, 153 
Perturbation calculation, 30 
theory, 39 
Poincaré cycle, 62, 65, 95, 100, 184 
recurrence theorem, 61, 63, 157, 
184 
Point, irregular, 181 
regular, 181 
Potential theory, 178 
Probability, after-effect, 138 
conditional, 70, 74, 75, 77 
Product measure, 133, 135, 179 


Random walk, 25, 29, 40, 45, 187 
Ray, 175, 264 
Recurrence theorem of Poincaré, 
61, 63, 157, 184 
Recurrence time, 153 
mean, 68, 80, 158 
variance of, 96 
Regular point, 181 
Rényi, 259 
Rice, 259, 260 


Sample space, 1, 3, 6, 12, 18, 20, 25 


Schrédinger equation, 176, 216 

Second law of thermodynamics, 98, 
132 

Siegert, 90, 264 

Smoluchowski, 71, 132, 139, 143, 
145, 161 

process, 142 

Spitzer, 52 

Square-free numbers, 17 

Statistical equilibrium, 136 

Steinhaus, 259 

Stochastic process, 68, 71, 133, 134, 
181 

Stosszahlansatz, 72, 99, 110 

Strong law of large numbers, 22 

Strong mixing, 72 

Strongly stationary process, 79 

Svedberg, 132, 139, 140, 145 

Szegö, 51-53, 55 


Tauberian theorem, 42, 174 
Thermal equilibrium, 3 
Thermodynamics, second law of, 
98, 132 
Time, mean recurrence, 68, 80, 158 
persistence, 152 
recurrence, 153 
recurrence, variance of, 96 
Toeplitz matrix, 50 


Uhlenbeck, 102, 124, 131, 262 
Uncertainty principle, 45, 211 


Van Hove, 198 

Variance, of recurrence time, 96 
Verdichtungskurve, 102, 199 
Volume, excluded, 36 


Weak law of large numbers, 4 
Weyl, 261 
Wiederkehrsatz, 61 
Wiener, 165 
integral, 165, 168 
measure, 162, 163 


Yaglom, 264 
Yvon, 188 


Zermelo, 61, 72, 74, 79 


